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Introduction 

This  project  is  set  to  test  a  hypothesis  that  breast  cancer  may  express  many  RNA  chimeras 
not  only  because  there  are  fusion  genes  derived  from  chromosomal  translocation  but  also 
because  of  abnormal  trans-splicing  of  RNA  transcripts.  Many  of  these  RNA  chimeras  may 
influence  the  behaviors  of  breast  cancer  via  undiscovered  mechanisms.  We  initially  planned  to 
establish  the  world’s  first  comprehensive  list  of  breast  cancer  specific  fusion  RNAs.  However, 
last  year  we  found  out  that  our  European  competitor,  Dr.  Rolf  I.  Skotheim,  had  just  submitted  a 
patent  application  for  making  the  same  microarray  chip  (Patent  application 
number:  20100279890;  http://www.faqs.Org/patents/app/20100279890#b).  Several  hundreds  of 
fusion  RNAs  on  their  array  list  overlap  with  those  on  our  list.  While  this  latest  development 
strengthens  the  importance  of  this  project,  it  also  forces  us  to  forgo  our  task  of  building  a  similar 
chip  for  legal  consideration.  Since  understanding  of  what  chimeric  RNAs  are  formed  in  breast 
cancer  and  how  they  are  formed  is  still  important  for  us  to  disclose  mechanisms  for  breast  cancer 
formation  and  progression,  we  continue  to  identify  chimeric  RNAs  from  different  databases  and 
have  obtained  several  novel  findings  in  the  past  year,  as  summarized  below. 

Body 

Sequence  analyses  of  chimeric  RNAs  from  databases  (Task  1): 

Since  last  year’s  report,  the  number  of  putative  chimeric  RNAs  in  different  databases 
continues  increasing,  which  on  one  hand  provides  us  more  candidates  but  on  the  other  hand 
increases  the  burden  of  task  la.  We  have  analyzed  about  2,000  more  expression  sequence  tags 
(ESTs)  of  putative  chimeras  in  the  past  year.  The  three  categories  of  chimeric  or  trimeric  RNAs, 
i.e.  those  with  a  gap  or  an  overlap  or  with  two  partner  genes  directly  joined,  have  the  similar 
frequencies  as  reported  last  year.  Also  as  reported  last  year,  conduction  of  task  la  leads  to  a 
serendipitous  finding  of  trimeric  RNA,  i.e.  one  RNA  sequence  containing  three  different 
partners.  We  have  found  more  such  trimeras  and  enlarge  our  database  (Table  1).  Moreover,  we 
identified  several  ESTs  that  contain  four  genes’  elements,  coined  as  tetrameric  RNA  or 
tetrameras  (Fig.  1). 

Table  1:  Trimeric  ESTs. 


AI335862 

BF826714 

BF826602 

BF764896 

BF762577 

BF744644 

BF331329 

BF306729 

BE694080 

AI924910 

AU132130 

BF1 09407 

AV744183 

AV729389 

AV725012 

BE814336 

BE762537 

BE876577 

AW608255 

BE715872 

BE715869 

BE715858 

BE709675 

BE694009 

BE696199 

BQ348968 

AA5 14694 

AW956968 

BF803049 

BF764896 

BF331329 

AU 142287 

BE172179 

X93499 

BM824189 

BG995785 

AW999004 

BQ689257 

BQ689139 

BU539467 

AI925024 

BF814512 

BG0031 10 

R19361 

BE898652 

BC064904 

M77198 

BM915020 

BI004882 

BF995070 

BF878278 

BF1 09407 

AW994480 

BE716966 

BE074730 

BE876742 

BE937759 

BF9871 18 

BM691077 

BM703781 

BX1 09950 

Note:  Each  of  these  ESTs  contains  three  sequence  elements. 


BE762537 

TGATAGATTGGTCCAATTGGGTGTGAGGAGTTCAGTTATATGTTTGGGATTTTTTAGGTAGTGGGTGTTGAGCTTGAACGCTTTCT CGATGGGTGTC 
GGCAAAGATCCAGGATAAGGAAGGCATTCCTCCTGATCAGCAGAGGTTGATCTTTGCCGGAAAACAGCTGqGAGGGATGCCTTCCTTGTCTTGGATC 
TTTGCCTTGACATTCTCAATGGTGTCCTACTGTCTATTTCTGACCCAGCTCATGGAATTTTTTCATCTTATACTGAGCTCCAGAAAGGACGTAACTT 
AGCATGGAT C AC CAAT CAAT C AAAAAAT AAAT AAAT C AC TAAGGAT TGGAGAAC T C AT AGAAC AAGGT GAAAGAC AT G 

Fig.  1:  BE762537  is  a  tetrameric  RNA.  Its  first  86  nt  (boldfaced)  belongs  to  the  2198-2283rd  region  of 
the  L-strand  of  mitochondrial  (mt)  RNA.  The  following  1 1  nt  sequence  (italicized  and  underlined)  is  an 
unmatchable  gap.  Its  98- 168th  nt  sequence  is  part  of  alternatively  spliced  exon  2  of  the  UBC  mRNA  from 
chromosome  (chr)  12,  which  is  followed  by  a  53-nt  (the  268-220th)  UBC  antisense  fragment.  Both  sense 
and  antisense  fragments  of  UBC,  which  overlap  at  the  168th  nt  (the  lowercase  “g”),  have  multiple  repeats 
in  the  UBC  mRNA.  The  last  149  nt  (the  221 -369th  nt)  is  part  of  the  ENPP6  mRNA  from  chr  4. 

Identification  of  chimera  and  trimeras  that  contain  mitochondrial  sequence 

Conduction  of  task  la  also  led  to  serendipitous  identification  of  some  chimeric  and  trimeric 
RNAs  that  contain  mitochondrial  (mt)  RNA  sequence  (table  2  as  Appendix  1).  If  any  of  these 
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ESTs  truly  exists,  it  suggests  that  the  unclear  RNA  (nRNA)  pool  and  the  mtRNA  pool  can  join 
together  as  a  previously  unaware  mechanism  to  enlarge  the  cellular  RNA  repertoire.  Moreover, 
in  over  50%  of  the  mt-sequence-containing  fusion  RNAs,  the  mt  element  seems  to  belong  to  the 
light  (L)  strand  of  mtDNA.  This  provides  an  answer  for  a  long-time  puzzle  why  the  most  part  of 
the  L-strand  transcript  is  non-coding:  probably,  it  is  an  important  source  for  fusion  RNA 
formation.  Actually,  in  many  ESTs,  the  nRNA  components  are  large  and  do  not  belong  to  exons 
of  known  genes,  implying  that  an  unaware  role  of  long  non-coding  nRNAs  may  be  to  form 
fusion  RNAs. 


mtRNAs  also  undergo  cis-  and  trans-splicing 

Although  cis-  and  trans-splicing  are  known  to  occur  in  mtRNAs  of  plants  and  some  low-level 
organisms  [1;2],  there  has  only  been  one  reported  chimeric  mtRNA  in  human  [3;4]  and  another 
one  in  mouse  [5],  Because  human  mtDNA  does  not  contain  intron,  mtRNAs  are  not  supposed  to 
undergo  cis-splicing  [6].  However,  we  identified  many  mt-containing  ESTs  that  contain  two  or 
more  linear  mt-segments,  which  is  surprising  as  it  suggests  cis-splicing  of  human  mtRNA  (table 
2  as  Appendix  1).  Actually,  some  ESTs  that  contain  mt  sequence  only,  i.e.  not  fusion  RNA,  also 
show  possible  cis-  or  trans-splicing  (Fig.  2). 


Fig.  2:  BE898652  is  a  trimeric  RNA  containing  trans-spliced  mt  sequence.  Its  3-64th  nt  region  matches 
the  L-strand  of  the  mtDNA  (AC  000021),  which  is  part  of  the  antisense  of  the  ATP8  gene  from 
mitochondria,  while  its  63-46 1st  nt  sequence  matches  the  8586-8986th  nt  sequence  of  the  L-strand,  which 
is  part  of  the  antisense  of  the  mt  ATP6  gene.  The  two  sequences  overlap  at  the  boldfaced  lowercase  “tg”. 
The  last  part  of  this  EST  (underlined)  is  part  of  the  PSMC4  mRNA  from  chr  19. 


mtRNAs  in  fusion  RNAs  also  save  copies  in  chromosomes  1  and  5 

Mitochondrial  genes  have  many  pseudogenes  in  the  nuclear  chromosomes,  which  are  often 
referred  to  as  Nuclear  Mitochondrial  (NIJMT)  sequence.  A  total  of  755  NUMTs  have  been 
identified,  which  are  equally  distributed  to  all  23  pairs  of  human  chromosomes  without 
obviously  being  skewed  to  any  chromosome  [7;8],  However,  to  our  surprise,  mt-sequence 
containing  ESTs  have  their  NUMTs  more  frequently  in  chromosomes  1  and  5  than  in  other 
chromosomes  (table  2  as  Appendix  1).  Probably,  evolutional  insertion  of  mtDNA  into  the 
nuclear  genome  is  not  a  random  event  [9],  and  those  NUMTs  that  have  a  preference  to  save  a 
copy  in  chromosomes  1  and  5  may  also  have  a  preference  to  fuse  to  nRNAs.  Whether  and  how 
there  is  any  connection  between  the  insertion  to  the  nuclear  genome  and  the  fusion  with  nRNA  is 
an  intriguing  question  to  be  explored. 

Evidence  for  a  novel  mechanism  for  how  two  RNAs  are  fused 

How  two  RNAs  are  fused  is  unknown.  There  two  major  hypotheses.  One  is  trans-splicing, 
i.e.  two  RNAs  are  transcribed  simultaneously  and  then  spliced  into  one.  Another  is  so-called 
“transcription-slippage”,  in  which  transcription  from  one  gene  is  slipped  to  another  gene. 
However,  if  mt-sequence  containing  fusion  RNA  exists,  even  if  just  a  single  one,  it  implies  that 
either  nRNAs  are  imported  into  the  mitochondria,  to  fuse  with  mtRNA,  or  mtRNAs  are  imported 
into  the  nucleus  to  fuse  with  nRNAs.  Either  situation  indicates  a  previously  unaware  mechanism 
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for  RNA  fusion,  because  transportation  of  nRNA  from  the  nucleus  to  the  mitochondria  or  fusion 
of  the  mtRNA  to  the  nRNA  should  happen  after  the  nuclear  transcription  has  been  terminated 
and  splicing  has  been  completed.  An  important  additional  support  for  this  new  mechanism  is  our 
finding  that  in  some  ESTs,  the  downstream  partner  is  fused  to  the  poly- A  tail  of  the  upstream 
partner  (Fig.  3),  because  polyadenylation  occurs  after  pre-mRNA  cleavage  and  splicing 
completion.  Moreover,  it  remains  possible  that  this  new  mechanism  occurs  outside  the  nucleus, 
whereas  both  transcription  slippage  and  trans- splicing  of  nRNA  occur  in  the  nucleus  [10]  . 

AU142287 

CAGTGTGTGCCTCCCCGAGCCTCAGCCCCAAGCTGATTTCTTATCTGGAAATGGTACACTGAATTCTCTGGGTGGCTTTCTTGTGGCCCCATGGGA 
TGCAGCGTGGGGGCTGTCTGAAGGACCCTGCTTTTTCCAGGGGCCGAGGGGCTGCCTTTCCTTTAAAAAAAAAAAAAAAAA  3ATTTTTTACCTGAG 
TAGGCCTAGAAATAAACATGCTAGCTTTTATTCCAGTTCTAACCAAAAAAATAAACCCTCGTTCCACAGAAGCTGCCATCAAGTATTTCCTCACGC 
AAGCAACCGCATCCATAATCCTTCTAATAGCTATCCTCTTCAACAATATACTCTCCGGACAATGAACCATAACCAATACTACCAATCAATACTCAT 
CATTAATAATCATAATGGCTATAGCAATAAAACTAGGAATAGCCCCCTTTCACTTCTGAGTCCCAGAGGTTACCCAAGGCACCCCTCTGACATCCG 
GCCTGCTTCTTCTCACATGACAAAAACTAGCCCCCATCTCAATCATgcttttctctCTCTCTTTCACTGCAAGGCGGCGGCAGGAGAGGTTGTGGT 
GCTAGTTTCTCTAAGCCATCCAGTGCCATCCTCGTCGCTGCAGCGACACACGCTCTCGCCGCCGCCATGACTGAGCAGATGACCCTTCGTGGCACC 
CTCAAGGGCCACAACGGCTGGGTAACCCAGATCGCTACTACCCCGCAGTTCCCGGACATGATCCTCTCCGNCTCTCGAGATAAGACCATCATCATG 
TGGAAACTGACCAGGGATGAGACCAACTATGGAATTCCCAGCGGCTCTGCGGGGCACTNCCACTTTGTagngg 

Fig.  3:  The  1-1 60th  nt  matches  the  429-588th  nt  of  the  last  exon,  which  should  have  2248  bp,  of  the 
POLDIP3  mRNA  from  chr  22,  indicating  an  early  transcriptional  termination  followed  by 
polyadenylation  (boldfaced  and  underlined  sequence).  Following  the  poly-A  sequence,  i.e.  the  178-526th 
nt  region  (italicized  grey  sequence),  is  part  of  the  ND2  mRNA  from  mitochondria.  The  527-836th  nt 
sequence  belongs  to  the  first  two  exons  of  the  GNB2L1  mRNA  from  chr  5,  with  the  first  10  nt 
(underlined  lowercase  letters)  alternatively  initiated  from  the  -10  bp  of  the  GNB2L1  gene.  The  last  85  nt 
(underlined)  has  a  few  deleted  mismatch  to  the  first  88  nt  of  exon  2  of  the  GNB2L1.  The  last  5  nt  (agngg) 
is  unmatchable  and  might  belong  to  the  cloning  vector. 


Identification  of  a  new  spliced  mtRNA 

When  conducting  tasks  2  and  3,  we  found  that  most  ESTs  could  not  be  detected  in  the 
cDNA  libraries  we  constructed  from  15  pairs  of  samples  of  breast  cancers  and  adjacent  normal 
tissues.  This  enforces  our  previous  conclusion  that  the  majority  of  ESTs  may  be  technical 
artifacts,  while  the  majority  of  the  truly  existing  chimeric  RNAs  are  derived  from  chromosomal 
translation.  Several  of  the  possible  mechanisms  for  the  formation  of  such  artifacts  are  proposed 
in  our  new  publication  (Appendex  2).  In  addition,  during  experimental  verification  of  ESTs,  we 
identified  a  novel  mtRNA  derived  from  cis-splicing  (Fig.  4). 


|  Revere*  prtrner 


urn wwuwweac 


2837/2000* 


Forward  primer 


Fig.  4:  RT-PCR  followed  by  T-A  cloning  and  sequencing  identifies  a  spliced  mtRNA  from  HEK293  cells 
in  which  the  2069-2836th  nt  region  of  the  mtRNA  was  deleted  during  splicing,  when  aligned  with 
mtDNA  by  using  UCSC  browser.  The  sequence  is  reverse-complementary  to  mtDNA.  The  boldfaced 
region  is  the  downstream  mt  exon.  Both  forward  and  reverse  primers  used  were  underlined.  The  number 
of  nucleotide  (nt)  in  the  mtDNA  nt  was  based  on  the  UCSC  browser,  with  the  position  of  the  last  nt  in  the 
reverse  primer  and  the  first  nt  in  the  forward  primer  indicated.  The  sequences  before  the  reverse  primer 
and  after  the  forward  primer  belong  to  the  cloning  vector. 


Technical  renovations 

When  conducting  tasks  2  and  3,  we  encountered  two  major  technical  difficulties  for  experimental 
verification  and  detection  of  RNA.  One  is  that  many  genes  are  expressed  in  association  with  their 
antisense  transcripts,  in  line  with  the  literature  report  that  for  over  63%  of  the  genes,  their  RNA 
transcripts  are  accompanied  by  antisense  counterparts  [11].  Another  hurdle  is  a  surprising 
finding  that  routine  reverse  transcription  (RT)  of  RNA  can  be  primed  by  endogenous  primers 
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existing  in  the  RNA  samples.  It  took  us  great  efforts  to  solve  these  technical  obstacles  by 
establishing  two  new  cloning  methods  and  a  novel  method  called  “cDNA  protection  assay”  that 
replaces  the  traditional  RNA  protection  for  verification  of  an  RNA.  All  these  methods  are  now 
published  in  RNA  Biology  (Appendix  2). 

It  is  worth  mentioning  that  our  novel  strategy  to  detect  RNA  by  protecting  cDNA  has  several 
merits.  DNA/RNA  hybrid  has  its  unique  structure  and  compositions  that  are  distinguishable  from 
DNA/DNA  or  RNA/RNA  hybrid,  in  part  because  DNA/DNA  contains  dA  and  dT,  RNA/RNA 
contains  rA  and  rU,  while  DNA/RNA  contains  all  four.  These  differences  should  provide  us  with 
unique  strategies  to  develop  sensitive  methods  and  instruments  for  the  detection  and 
quantification  of  those  DNA/RNA  hybrids  that  are  at  very  low  abundance.  Such  strategies  should 
be  applicable  and  thus  intriguing,  as  endogenous  DNA/RNA  hybrids  in  eukaryotic  cells  are 
many  fewer  than  the  DNA/DNA  and  RNA/RNA  hybrids,  especially  when  a  larger  DNA/RNA 
fragment  is  designed  for  protection. 

Key  Research  Accomplishments 

l.In  the  past  two  years  we  have  analyzed  over  70,000  putative  chimeras  from  different 
databases,  much  more  than  we  originally  planned.  However,  RT-PCR  verification  suggests  that 
the  majority  of  them  may  be  artifacts,  and  major  technical  reasons  are  described  in  our 
publication  (Appendix  2). 

2.  We  have  identified,  for  the  first  time,  mt-sequence  in  chimeric,  trimeric  and  even  tetrameric 
RNAs,  suggesting  that  mtRNAs,  especially  those  transcribed  from  the  non-coding  regions  of  the 
L-strand,  may  be  combined  with  nuclear  RNA  to  enlarge  the  RNA  repertoire. 

3.  We  found  that  human  mtRNAs  also  undergo  cis-  and  trans-splicing,  and  have  identified  a 
novel  cis-splicing  derived  mtRNA. 

4.  We  have  established  several  cloning  methods  and  a  new  strategy  for  verification  of  the 
existence  of  RNA,  coined  as  “cDNA  protection  assay”.  These  methods  are  published  (Appendix 
2) 

5.  The  reference  gene  methods  mentioned  in  the  last  year’s  report  has  also  been  published. 
(Appendix  3). 

6.  The  PI  (Dr.  Liao)  and  the  postdoctoral  fellows  whose  salary  and  position  are  supported  by 
this  grant  have  also  published  several  other  papers.  (Appendix  4) 

Reportable  outcome 

1 .  A  longer  list  of  trimeric  RNA  as  a  database  is  made. 

2.  A  paper  published  to  show  peers  new  reference  genes  and  primers  for  RT-PCR  methods. 

3.  A  paper  published  to  describe  new  cloning  methods  and  RNA  verification. 

4.  Two  postdoctoral  fellows  are  supported  by  the  funds  and  they  have  publications  in  the  past 
year. 

Conclusion 

So  far  there  probably  have  been  over  a  million  of  putative  chimeric  RNAs  reported  in  the 
literature  or  deposited  in  different  databases.  However,  the  vast  majority  of  these  chimeras 
remain  unverified  and  therefore  are  still  meaningless  to  us.  For  this  reason,  our  work  to  verify 
them  and  to  clone  their  full  length  sequence  is  of  importance.  After  analyzing  over  70,000 
chimeric  sequences  and  determining  the  expression  status  of  about  3000  selected  candidates,  we 
conclude  that  the  majority  of  putative  chimeric  RNAs  in  different  databases  may  be  technical 
artifacts.  In  a  publication  we  propose  major  technical  reasons  for  how  the  artifacts  are  made.  We 
also  conclude  that  most  of  those  truly  existing  chimeras  are  associated  with  a  corresponding 
change  in  the  genome.  Of  those  chimeric,  trimeric  and  even  tetrameric  RNAs  that  truly  exist  and 
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occur  at  the  RNA  level,  mitochondrial  RNAs,  especially  those  transcribed  from  the  non-coding 
regions  of  the  L-strand,  participate  in  their  formation.  In  other  words,  human  mitochondrial 
RNAs  also  undergo  cis-  and  trans-splicing  and  fuse  with  nuclear  RNAs  to  enlarge  the  cellular 
RNA  repertoire,  which  implies  a  previously  unaware  mechanism  for  RNA  fusion  that  may  occur 
at  the  cytoplasm,  but  not  the  nucleus. 
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Table  2:  Chimeric,  trimeric  or  tetrameric  ESTs  that  contain  mitochondrial  sequence 


# 

1 - 

Mitochondria 

Chromosome 

Fusion 

Access  # 

M  strand/region 

M-span  (nt) 

|NUMT 

Partner 

1 

DB3241 19.1 

F7212-7515 

304 

1 

M-10 

Chimera 

2 

BE898652.1 

F8462-8527,  F8569-8969 

62,  401 

1,  1 

M-M-19 

Trimera 

3 

AU142287.1 

H4547-4895 

349 

1 

22-M-5 

Trimera 

4 

BF378297.1 

L8443-8642 

200 

1 

M-22 

Chimera 

5 

BE716966.1 

H7587-7790 

204 

1 

M-l-11 

Trimera 

6 

BE8991 19.1 

H8399-8936 

538 

1 

M-22 

Chimera 

7 

AV744 183.1 

H4333-4399 

67 

1 

M-19-9 

Trimera 

8 

BF762577.1 

F6897-7014,  F9645-9779 

118,135 

1,  1 

7-M-M 

Trimera 

9 

BF764896.1 

HI  1165-1 1248 

84 

5 

2-M-6 

Trimera 

10 

BE709292.1 

F10705-10865 

161 

5 

M-7 

Chimera 

11 

BE709354.1 

F10705-10864 

160 

5 

M-7 

Chimera 

12 

BE899559.1 

H10677-1 170 

494 

5 

6-M 

Chimera 

13 

BF083248.1 

H14061-14231 

171 

5 

10-M 

Chimera 

14 

BF083272.1 

H14061-14233 

173 

5 

10-M 

Chimera 

15 

AV75 1897.1 

H1051 1-10874 

364 

5 

M-19 

Chimera 

16 

BF306729.1 

H10637-10864 

227 

5 

5-M 

Chimera 

17 

BF744644.1 

H15540-15721,  H14092-14295 

182, 204 

5 

M-M-ll 

Trimera 

18 

BE762537.1 

F2 198-2283 

86 

17 

M-12-12-4 

Tetramera 

19 

BE876577.1 

H10689-1 1361,  H84455-8680 

493, 226 

5,  1 

M-M-ll 

Trimera 

20 

AV702773.1 

H2639-3082 

444 

17 

14-M 

Chimera 

21 

AA5 14694 

F9176-9208 

33 

1 

M-8-8 

Trimera 

22 

AA581515 

F7399-7453 

55 

1,  17 

M-17 

Chimera 

23 

AA679609 

F7399-7520 

122 

1 

M-22 

Chimera 

24 

AA679609 

L73  99-7520  (7AA679609) 

122 

1 

M-22 

Chimera 

25 

AI925024 

H2037-2239 

203 

3,  11,5 

14-M- 13 

Trimera 

26 

AW  134795 

L1619-1671 

53 

11 

M-l 

Chimera 

27 

AW370799 

H8966-9070 

105 

1,5 

19-M 

Chimera 

28 

AW753072 

L14400-14455 

56 

18,5,  17,5 

M-9 

Chimera 

29 

AW821349 

F12015-12081 

67 

5 

M-16 

Chimera 

30 

AW898803 

H2647-2773 

127 

None 

X-M 

Chimera 

31 

AW950200 

H7276-7520 

245 

1 

5-M 

Chimera 

32 

BE074730 

F6707-6844 

48 

1,  x,  2 

M-6-17 

Trimera 

33 

BE162186 

H5 117-5322 

205 

1 

M-13 

Chimera 

34 

BE876742 

H7398-7457 

60 

1 

19-M-l 

Trimera 

35 

BE937759 

F9080-9144 

65 

1 

9-M 

Chimera 

36 

BF852160 

L4703-4822 

120 

1 

17-M 

Chimera 

37 

BF9871 18 

H6568-6604 

37 

1 

M-21-7 

Trimera 

38 

BF988359 

F6876-6938,  F6950-7025 

63,76 

1,  1 

7-M-M 

Chimera 

39 

BG995785 

H10473-10589,  F15168-15217 

117,50 

5,5 

M-M-l 

Trimera 

40 

BM691077 

H14103-14159 

57 

5 

17-M-l 

Trimera 

41 

BM703781 

H9567-9605 

39 

1 

4-M 

Chimera 

42 

BM997144 

F7189-7520 

332 

1 

M-2 

Chimera 

43 

BP348380 

H2852-3018 

167 

5 

19-M 

Chimera 

44 

BQ300150 

H8936-8995 

60 

1 

3-M 

Chimera 

45 

BQ348968 

F12338-12505 

168 

None 

M-8-2 

Trimera 

46 

BQ638079 

HI  1651-11698 

48 

5,5 

5-5-5-M 

Tetramera 

47 

CV385666 

F2620-2837 

218 

11 

M-7 

Chimera 

48 

DA086571 

H7201-7521 

321 

1 

M-l 

Chimera 

49 

DA182598 

H2417-2733 

317 

11,3,6,17,5 

M-l  1 

Chimera 

50 

DA365070 

H1613-1672 

60 

11,7,5 

M-7 

Chimera 

51 

DA511096 

H7192-7533 

342 

1 

M-l 

Chimera 

52 

DA757571 

H242 1-2762 

342 

None 

M-l 

Chimera 

53 

DB3 14922 

F7393-7515 

123 

1 

M-l 

Chimera 

54 

DB3241 19 

F7212-7515 

304 

1 

M-10 

Chimera 

55 

AW994480 

F2654-2804,  F28 14-2880,  L2894-298'  63,  67,  91 

None 

M-2-8 

Trimera 

56 

BE694080 

L2654-2984  (7AW994480) 

63,67,91 

None 

M-2-8 

Trimera 

57 

BF826602 

LI  1053-1 1116 

64 

5,5 

12-15-M 

Trimera 

Note:  The  first  and  last  nt  positions  at  the  mtDNA  of  an  mt  sequence  (M)  are  indicated,  based  on 
UCSC  browser,  while  its  length  (span  in  the  number  of  nt)  may  not  always  be  calculated  due  to 
possible  deletion  of  several  nt.  The  chromosome  or  chromosomes  that  harbor  an  NUMT  homologous 
to  the  mt  sequence  are  indicated.  The  order  of  each  partner  in  the  chimera,  trimera  or  tetramera  is 
shown  in  the  5'-to-3'  orientation. 
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We  established  new  methods  for  cloning  cDNA  ends  that  start  with  reverse  transcription  (RT)  and  soon  proceed  with 
the  synthesis  of  the  second  cDNA  strand,  avoiding  manipulations  of  fragile  RNA.  Our  3'-end  cloning  method  does  not 
involve  poly-dT  primers  and  polymerase  chain  reactions  (PCR),  is  low  in  efficiency  but  high  in  fidelity  and  can  clone 
those  RNAs  without  a  poly-A  tail.  We  also  established  a  cDNA  protection  assay  to  supersede  RNA  protection  assay.  The 
protected  cDNA  can  be  amplified,  cloned  and  sequenced,  enhancing  sensitivity  and  fidelity.  We  report  that  RT  product 
using  gene-specific  primer  (GSP)  cannot  be  gene-  or  strand-specific  because  RNA  sample  contains  endogenous  random 
primers  (ERP).  The  gene-specificity  may  be  improved  by  adding  a  linker  sequence  at  the  5'-end  of  the  GSP  to  prime  RT 
and  using  the  linker  as  a  primer  in  the  ensuing  PCR.  The  strand-specificity  may  be  improved  by  using  strand-specific 
DNA  oligos  in  our  protection  assay.  The  CDK4  mRNA  and  TSPAN31  mRNA  are  transcribed  from  the  opposite  DNA  strands 
and  overlap  at  their  3'  ends.  Using  this  relationship  as  a  model,  we  found  that  the  overlapped  sequence  might  serve  as  a 
primer  with  its  antisense  as  the  template  to  create  a  wrong-template  extension  in  RT  or  PCR.  We  infer  that  two  unrelated 
RNAs  or  cDNAs  overlapping  at  the  5'-  or  3'-end  might  create  a  spurious  chimera  in  this  way,  and  many  chimeras  with  a 
homologous  sequence  may  be  such  artifacts.  The  ERP  and  overlapping  antisense  together  set  complex  pitfalls,  which 
one  should  be  aware  of. 


Introduction 

A  recent  advance  in  RNA  research  suggests  that  virtually  the 
entire  non-repeat  part  of  the  human  genome  is  transcribed, 
at  least  at  some  times  or  in  some  cell  types,7,17  with  a  tally  of 
161,000  transcripts  so  far.25  Moreover,  it  is  estimated  that  over 
63%  of  RNA  transcripts  are  accompanied  by  antisense  coun¬ 
terparts,24  and  the  Unigene  database  of  the  National  Center 
for  Biotechnology  Information  (NCBI)  contains  over  123,000 
human  antisense  entries.26  One  meaning  of  these  figures  is  that, 
for  most  genomic  loci,  both  strands  of  the  DNA  double  helix  are 
transcribed.6,19  Most  antisense  transcripts  may  be  non-coding, 
but  there  are  still  many  that  do  encode  proteins.  For  example,  the 
DNA  strand  opposite  to  the  one  encoding  the  THRA  (17qll.2), 
CDK4  (12ql4.1),  CCND1  (llql3)  and  GAPDH  (12pl3) 
genes  harbors  the  NR1D1,  TSPAN31,  LOC100996515  and 
LOCI 00996336  protein-coding  genes,  respectively,  as  shown  in 


the  NCBI.  Cloning  cDNA  often  involves  reverse-transcription 
(RT)  and  polymerase  chain  reactions  (PCR),  but  the  situation 
wherein  antisense  is  also  expressed  often  sets  pitfalls  and  hur¬ 
dles,  which  are  widely  neglected,  in  our  way  of  cloning  the  5'- 
or  3 '-end  of  cDNA  or  determining  from  which  DNA  strand  an 
RNA  is  transcribed.  For  instance,  it  may  not  be  easy  to  clone  the 
5'  and  3'  ends  of  the  so-called  ncRNACCND143  and  to  determine 
whether  it  is  transcribed  from  the  same  strand  as  the  CCND1  or 
as  the  LOC100996515. 

Another  ribonomic  advance  suggests  that  transcripts  from 
about  65%  of  the  human  genes  form  chimeric  RNA  with  a  tran¬ 
script  from  another  gene.  This  other  gene  in  most  cases  is  nearby 
on  the  same  chromosome  but  can  also  be  located  on  another 
chromosome.7,17  Actually,  modern  RNA-sequencing  technolo¬ 
gies  have  provided  us  with  thousands  of  RNA  chimeras.14  A  tiny 
number  of  them  are  known  to  be  transcribed  from  fusion  genes 
that  are  formed  due  to  genetic  alterations,  such  as  chromosomal 


Correspondence  to:  Yongming  Liu  and  Joshua  Liao;  Email:  liuym@glmc.edu.cn  and  djliao@hi.umn.edu 
Submitted:  02/22/13;  Revised:  03/31/13;  Accepted:  04/05/13 
http://dx.doi.Org/1 0.41 61/rna. 24570 


www.landesbioscience.com 


RNA  Biology 


1 


5’- 

5’- 


•4-GSP  GSP- 


F 


-►3’ 

-3’ 

-►3’ 


5’: 


FI  36 

r» 


F665  R1086 


t 

+1 


t 

si 


CDK4 

— ►  3’ 


CCTGGGCAACAGAGCAAGACTCTGTGTCAAAA 
AAAAAAAAAGAAT AT AGAT T T T T AAATGGC AA 
AAAAAAAAAAAAAAAAaATCACTAGTGCGGCC 
GCCTGCAGGTCGACCATATGGGAGAGCTCC 


B5’  3’  mRNA without  poly-A 

I  RT  with  random  primers 

5’  1st  strand  cDNA 


^  Forward  primer 
5  ’  3  ’  3’  end  of  2nd  strand 


^S1  nuclease  digests  overhang 


Blunt  ends 


A^- 


^  Add  A-overhang  by  Taq  polymerase 

T  ►A 


lT' 


A  cloning 


q&> 

kO6  n\ 


1  kb  \ 
850  bp- 
650  bp- 
500  bp- 

400  bp/ 


No  SI 


ofo  ftfo 


1  kb 
850  bp 
650  bp 
500  bp 

400  bp 


IOuSI 


G 


^v# 

A  > 


1  kb 
850  bp 
650  bp 
500  bp 
400  bp 


15  u  SI 


Figure  1.  Cloning  RNA  3'  end.  (A)  Two  hurdles  for  cloning  the  5'  or  3'  end,  and  for  PCR  amplification,  of  chimeric  cDNA:  (1)  Gene-specific  primer  (GSP) 
used  in  RACE  amplifies  the  5'  or  3'  end  of  not  only  the  chimeric  cDNA  but  also  the  cDNA  of  a  parent  gene  (black  or  red  line).  (2)  Forward  (F)  or  reverse 
(R)  primer  primes  not  only  the  chimeric  cDNA  but  also  the  cDNA  of  a  parent  gene,  making  the  first  several  cycles  of  PCR  less  efficient.  (B)  Our  strategy 
for  cloning  RNA  3'  end:  After  RT  with  random  hexamers,  a  forward  primer  of  the  gene  of  interest  and  Taq  are  used  to  synthesize  the  3'  part  of  the 
second  cDNA  strand.  SI  is  added  to  cut  the  3'-overhang  of  the  first  strand.  The  cDNA  blunt  ends  are  then  appended  with  a  dA  by  Taq,  followed  by  T-A 
cloning.  (C)  Illustration  of  the  locations  of  primers  and  the  SI  cutting  site  on  the  CDK4  mRNA.  (D)  Part  of  the  3'  sequence  obtained,  in  which  the  low¬ 
ercase  "a"  is  added  by  Taq  and  the  underlined  sequence  belongs  to  the  T-A  vector.  The  sequence  matches  completely  to  the  CDK4  mRNA.  Note  that 
there  is  an  internal  poly-A  sequence  21  nt  upstream  of  the  authentic  poly-A  tail.  (E-G)  Both  pairs  of  primers  (F665+R1086  and  F136+R1086)  designed 
to  amplify  the  5'  and  the  middle  regions  of  the  mRNA,  respectively,  can  amplify  the  RT  product  (cDNA)  of  RNA  from  FleLa  cells  without  SI  digestion 
(arrowheads).  F655+R1086  can  still  yield  a  band  from  the  cDNA  digested  with  10  or  15  units  of  SI,  while  the  F136+R1086  yielded  only  a  very  faint  band 
(arrow)  when  10  units  were  used  and  no  band  when  15  units  were  used. 


translocation  and  genomic  DNA  deletion  or  amplification.9 
Unfortunately,  the  vast  remaining  majority,  i.e.,  those  not  asso¬ 
ciated  with  a  known  genomic  alteration,  remain  putative  and 
are  not  very  meaningful  to  us  so  far,  because  their  existence  has 
not  been  verified  with  a  vigorous  method  and  because  their  full- 
length  sequence  has  not  been  cloned  and,  thus,  their  open  reading 
frame  is  unclear.16  These  weaknesses  are  due  mainly  to  the  lack 
of  reliable  and  efficient  approaches  of  cloning  and  verification. 
Current  RNA  sequencing  technologies  are  reliant  on  RT  or  on 
the  principles  similar  to  RT,13  provide  only  short  sequences,  and 
have  poor  strand-specificity,35  thus  only  suitable  for  screening, 
but  not  for  verification,  of  long  RNA.  Cloning  methods  involv¬ 
ing  RT-PCR  may  result  in  artificial  chimeric  cDNA8,11,23’30’35’36’38 
in  part  because  template  switching  may  occur  during  RT1 1,23,33 
and  mis-priming  can  occur  in  PCR.  RNA  protection  assay  is 
the  most  commonly  used  method  to  verify  the  true  existence  of 
an  RNA,  in  which  an  in  vitro  synthesized  complementary  RNA 
(cRNA)  is  used  to  hybridize  with  the  parental  RNA  in  solu¬ 
tion,  followed  by  RNase  digestion  of  the  non-hybridized  RNA. 


This  method  does  not  involve  RT  or  PCR,  which  on  one  hand 
increases  its  reliability,  but  on  the  other  hand  makes  the  method 
very  inefficient.13  Also  problematically,  the  protected  RNA  can¬ 
not  be  directly  sequenced  to  confirm  its  identity.  For  these  rea¬ 
sons,  most  attempts  to  verify  chimeric  RNAs  still,  unfortunately, 
use  problematic  RT-PCR.  Moreover,  it  needs  to  distinguish  a  true 
chimeric  cDNA  end  cloned  with  a  routine  5'  or  3?  RACE  (rapid 
amplification  of  cDNA  end)  method  from  the  cDNA  end  of  the 
parent  mRNA,  as  depicted  in  Figure  1A.  Actually,  because  in 
most  cases  the  mRNA  of  each  parent  gene  is  more  abundant,  the 
cloned  cDNA  end  is  more  likely  to  belong  to  the  parent  mRNA. 

We  attempt  to  develop  methods  that  are  devoid  of  the  above- 
described  weaknesses  in  cloning  and  verifying  chimeric  or  anti- 
sense-accompanied  RNA.  Although  not  yet  reaching  this  aim, 
we  have  established  new  methods  for  cloning  cDNA  ends  and 
have  established  a  cDNA  protection  assay  to  supersede  RNA 
protection  assay,  as  described  in  this  report.  Sometimes  “RNA,” 
but  not  “mRNA,”  is  termed  herein  because  our  methods  are  also 
suitable  for  cloning  those  long  RNAs  and  chimeric  RNAs  that 
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Table  1.  Primers  used 


Primer 

NewA 

NewC 

NewB  mixture 
NewD 
HPRT1F123 
HPRT1R683 
CCND1F70 
CCND1F183 
CCND1R981 
CCND1R1067 
CDK4F136 
CDK4F665 
NewDCF933 
NewDTF647 


Sequence 

5,-GTGGAGTCTACGCGAACTTGTCCT17-3, 
5'-GTGGAGTCTACGCGAACTTGTCC-3' 
5'-TCAGGATTGATGGTGCCTACAGC13V-3'  (V  =  A,G,T) 
5'-TCAGGATTGATGGTGCCTACAGC-3' 
5'-CTTCCTCCTCCTGAGCAGTC-3' 
5'-AACACTTCGTGGGGTCCTTT-3' 
5'-TAGCAGCGAGCAGCAGAGTC-3' 
5'-CCCAGCTGCCCAGGAAGAGC-3' 
5'-TTGACTCCAGCAGGGCTTCG-3, 
5'-TGTGCAAGCCAGGTCCACCT-3' 
5'-GTATGGGGCCGTAGGAACCG-3' 

5'  TCTGGTGACAAGTGGTGGAA  3' 


Primer 

CDK4R822 
CDK4F933 
CDK4R1086 
CDK4F1096 
TSPAN31F73 
TSPAN31F647 
TSPAN31R860 
TSPAN31R1668 
MYCF125 
MYCR838 
BCAS4E1F 
BCAS3E25R 

5'-TCAGGATTGATGGTGCCTACAGCGATGACTGGCCTCGAGATGT-3' 

5'-TCAGGATTGATGGTGCCTACAGCCTTAAGCATTCAGACGAAGC-3' 


Sequence 

5'-TCCACATGTCCACAGGTGTTGC-3' 

5'-GATGACTGGCCTCGAGATGT-3' 

5'-AGGCAGAGATTCGCTTGTGT-3' 

5'-TGCAGCACTCTTATCTACATAAGGAT-3' 

5'-AAGCTGTCGGGGTCCTGGAA-3' 

5'-CTTAAGCATTCAGACGAAGC-3' 

5'-ACCCTAGATATTCCCTAAGG-3' 

5'-CTTGGAAGAAGGGACTTTCC-3' 

5'-GCGCTGAGTATATAAAAGCCGGTT-3' 

5'-CCACCGCCGTCGTTGTCTCC-3' 

5'-TCCTGATGCTGCTCGTGGAC-3' 

5'-CATACACAGGGACCGAGCTT-3' 


Note:  The  number  in  the  primer  indicates  the  first  (for  forward)  or  the  last  (for  reverse)  nucleotide  of  that  primer  in  the  position  of  the  mRNA.Thus,  the 
range  between  the  F  and  R  numbers  should  normally  be  the  size  of  the  RT-PCR  amplified  DNA  fragment  in  agarose  gel. 


are  non-coding.  Some  pitfalls  and  artifacts  of  RT  and  PCR  that 
are  widely  neglected  in  the  literature  are  also  described  to  alert 
the  peers. 

Results 

Cloning  RNA  3'  end.  The  3'  end  of  long  RNAs  is  usually  cloned 
by  using  a  poly-dT  oligo  to  prime  the  poly-A  tail  of  the  RNA  or 
by  ligating  a  linker  sequence  to  the  3'  end  since  about  30%  of 
the  long  RNAs  lack  a  poly-A  tail,37  although  they  likely  have  a 
poly-A  signal.  In  our  strategy,  RNA  can  be  primed  with  random 
hexamers  in  RT.  Taq  DNA  polymerase  (Taq  for  brevity)  and  a 
forward  primer  of  the  interested  gene  are  used  to  synthesize  the 
3'  part  of  the  second  cDNA  strand,  which  is  also  the  3'  part  of 
the  parental  RNA.  SI  nuclease  (SI  for  brevity)  is  added  to  cut 
off  the  3'-overhang  of  the  1st  cDNA  strand  (Fig.  IB),  since  SI 
digests  single-stranded,  but  not  double-stranded,  DNA  or  RNA. 
The  blunt  ends  of  the  double-stranded  cDNA  are  then  appended 
with  a  dA  by  Taq  to  allow  cloning  the  fragment  into  a  T-A  vector 
(Fig.  IB). 

As  an  example,  unDNased  RNA  from  HeLa  cells  was  con¬ 
verted  in  RT  to  the  first  cDNA  strand  by  random  hexamers.  The 
CDK4F665  forward  primer  (all  primers  listed  in  Table  1)  and 
PCR  Mastermix  were  mixed  with  the  RT  product  to  synthesize 
the  second  strand  of  CDK4  cDNA  by  incubation  at  72°C  for 
10  min.  SI  was  added  to  cut  off  the  single-stranded  part  of  the  1st 
cDNA  strand  upstream  of  the  F665  primer  (Fig.  1C).  After  inac¬ 
tivation  of  SI  and  purification  of  the  double-stranded  cDNA, 
PCR  of  the  cDNA  with  F136+R1086  primers  did  not  yield  sig¬ 
nal,  which  confirmed  that  the  region  upstream  of  F665  (includ¬ 
ing  the  F136  sequence)  had  been  digested  by  SI,  while  PCR  with 
F665+R1086  yielded  a  band  (Fig.  1E-G),  indicating  that  the 
double-stranded  part  could  withstand  the  SI.  The  amount  of 
SI  might  need  to  be  optimized  for  different  target  genes,  due 


to  the  difference  in  the  residuals  of  RNAs  and  single  stranded 
cDNAs  to  be  digested,  since  herein  10  units  of  SI  was  not  suffi¬ 
cient  (Fig.  IF).  A  portion  of  the  purified  double-stranded  cDNA 
was  then  cloned  into  a  T-A  vector.  Sequencing  a  resultant  plas¬ 
mid  and  aligning  the  sequence  with  the  CDK4  mRNA  sequence 
(NM_000075.3)  revealed  that  the  canonical  3'  end,  including 
the  whole  poly-A  tail  (Fig.  ID),  was  fully  cloned. 

Cloning  RNA  3'  end  by  G-tailing.  Tailing  the  3'  end  of  a 
DNA  with  terminal  deoxynucleotidyl  transferase  (TdT)  is  a  tra¬ 
ditional  method  for  several  different  purposes,  including  cloning 
of  the  3'  end  of  a  cDNA.  We  prime  the  RNA  of  interest  with 
a  reverse  primer  in  RT  to  convert  the  RNA  to  the  first  cDNA 
strand  (Fig.  2).  After  removal  of  dNTP  and  short  oligos,  TdT 
and  dGTP  are  used  to  append  a  poly-dG  tail  to  the  3'  end  of  the 
cDNA,  which  is  the  RNA  5'  end.  A  poly-dC  mixture  is  used  to 
prime  the  poly-dG  for  synthesis  of  the  second  cDNA  strand.  This 
poly-dC  mixture,  referred  to  as  NewB,  contains  four  oligos  with 
a  linker  sequence  (dubbed  as  NewD)  at  the  5'  end  and  with  one 
of  the  four  bases  at  the  3'  end,  so  that  one  of  the  four  oligos  can 
be  anchored  on  the  last  nucleotide  (nt)  of  the  cDNA.  A  reverse 
primer  and  NewD  will  then  be  used  in  PCR  to  amplify  the  dou¬ 
ble-stranded  3'  part  of  the  cDNA  (Fig.  2). 

As  an  example,  unDNased  RNA  from  HeLa  cells  was  con¬ 
verted  in  RT  with  a  HPRTl  reverse  primer  (R683)  to  the  first 
cDNA  strand,  followed  by  removal  of  dNTP,  primers  and  other 
short  oliogs  by  running  the  reaction  through  a  RapidTip2,  fol¬ 
lowed  by  washing  and  precipitation  with  ethanol.  The  cDNA 
was  then  tailed  with  a  poly-dG  using  TdT  and  dGTP.  NewB 
was  used  to  prime  the  poly-dG  tail  for  synthesis  of  the  second 
cDNA  strand.  NewD  and  the  R683  were  used  as  the  primers  in 
PCR  to  amplify  the  double-stranded  part  of  the  HPRTl,  which 
resulted  in  a  fuzzy  band.  Purification  of  this  fuzzy  band  from 
agarose  gel  followed  by  second  PCR  resulted  in  a  clear  band  of 
the  correct  size,  which  was  cloned  into  a  T-A  vector.  Sequencing 
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mRNA  5’ 

iRT 

1  st  strand  cDNA  5’ 

TdT  I 

3  ’  GGGGGN  4  5’ 

NewB-^  2nd  strand  j 

5' TCAGGATTGATGGTGCCTACAGCCCCCM  ►  3’ 

3 ?  GGGGGN  4  5' 

NewD-^  PCR^  ^ 

5f TCAGGATTGATGGTGCCTACAGCCCCCM  ►  3' 

3f  AGTCCTAACTACCACGGATGTCGGGGGN*^^^^^^^  5' 

GCTCCGGCCGCCATGGCCGCGGGATtTCAGGATTGATGGTGCCTACAGCCCCCCCCCCCCC 

gGCGGGGCCTGCTTCTCCTCAGCTTCAGGCGGCTGCGACGAGCCCTCAGGCGAACCTCTCG 

GCTTTCCCGCGCGGCGCCGCCTCTTGCTGCGCCTCCGCCTCCTCCTCTGCTCCGCCACCGG 

CTTCCTCCTCCTGAGCAGTCAGCCCGCGCGCCGGCCGGCTCCGTTATGGCGACCCGCAGCC 

CTGGCGTCGTGATTAGTGATGATGAACCAGGTTATGACCTTGATTTATTTTGCATACCTAA 

TCATTATGCTGAGGATTTGGAAAGGGTGTTTATTCCTCATGGACTAATTATGGACAGGACT 

GAACGTCTTGCTCGAGATGTGATGAAGGAGATGGGAGGCCATCACATTGTAGCCCTCTGTG 

TGCTCAAGGGGGGCTATAAATTCTTTGCTGACCTGCTGGATTACATCAAAGCACTGAATAG 

AAATAGTGATAGATCCATTCCTATGACTGTAGATTTTATCAGACTGAAGAGCTATTGTAAT 

GACCAGTCAACAGGGGACATAAAAGTAATTGGTGGAGATGATCTCTCAACTTTAACTGGAA 

AGAATGTCTTGATTGTGGAAGATATAATTGACACTGGCAAAACAATGCAGACTTTGCTTTC 

CTTGGTCAGGCAGTATAATCCAAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC 

CCACGAAGTGTTaATCACTATNNNGNCGNCTGCAGGTCGACCATATGGGANANCTCCCCAC 

CGCNTTGNATGGCATAGCT 


Figure  2.  Cloning  RNA  5'  end.  In  our  strategy,  RNA  is  converted  in  RT  to  the  first  strand  of  cDNA  with 
a  reverse  primer  of  the  interested  gene.  TdT  and  dGTP  are  used  to  append  a  poly-dG  to  the  cDNA  3' 
end,  which  is  the  5'  end  of  the  RNA.  NewB  is  used  to  prime  the  synthesis  of  the  second  cDNA  strand 
with  PCR  Mastermix.  NewB  is  a  mixture  of  four  poly-dC  oligos  with  a  linker  sequence  (NewD)  at  the  5' 
end  and  one  of  the  four  bases  at  the  3'  end  (Table  1)  and,  thus,  can  be  anchored  on  the  last  nt  of  the 
cDNA.  The  NewD  and  a  reverse  primer  of  the  desired  gene  are  then  used  in  PCR  to  amplify  the  dou¬ 
ble-stranded  cDNA,  followed  by  T-A  cloning.  As  an  example,  RNA  from  HeLa  cells  was  RT  with  the 
HPRT1R683  primer.  The  HPRT1  cDNA  was  tailed  with  a  poly-dG,  followed  by  PCR  with  NewD+R683 
that  yielded  a  fuzzy  band.  Excision  and  purification  of  this  band  (boxed)  as  the  template  for  a  second 
round  of  PCR  with  the  same  primers  resulted  in  a  dominant  band  and  several  smaller  bands.  Clon¬ 
ing  and  sequencing  the  dominant  band  confirm  that  it  is  G-tailed  5'  end  of  HPRT1  mRNA.  In  the 
sequence  obtained,  the  lowercase  "t"  before  the  NewB  (underlined)  and  the  "a"  (added  by  Taq)  after 
the  R683  (underlined)  were  the  cloning  sites.  The  sequences  before  the  "t"  and  after  the  "a"  belong 
to  the  T-A  vector.  The  lowercase  "g"  after  NewB  is  the  first  nt  of  HPRT1  mRNA  anchored  by  the  NewB. 


850  bp-- 
650  bp  -■ 
500  bp  -| 


□ 


850  bp  - 
650  bp  - 
500  bp- 
400  bp- 
300  bp- 
200  bp-^ 


one  resultant  plasmid  followed  by  alignment  of  the  sequence  with 
the  HPRTl  mRNA  sequence  (NM_000194.2)  confirms  that  a 
NewB  primer  has  indeed  anchored  on  the  last  nt  of  the  HPRTl 

(Fig.  2). 

cDNA  protection  assay.  We  established  a  new  strategy  using 
cDNA  to  supersede  cRNA  and  accordingly  using  SI  to  replace 
RNase  in  RNA  protection  assay.  In  this  strategy,  an  RNA  aliquot 
is  primed  by  random  hexamers  in  RT  and  converted  to  cDNA. 
An  aliquot  of  the  cDNA  is  used  to  hybridize  with  a  commen¬ 
surate  amount  of  the  RNA  (Fig.  3A).  SI  is  added  to  digest  the 
non-hybridized  cDNA  and  RNA  and  then  is  inactivated.  PCR 
with  gene-specific  primers  (GSP)  ensues  to  amplify  a  fragment 
of  the  RNA-protected  cDNA.  As  a  negative  control,  an  equal 
aliquot  of  the  cDNA  is  directly  digested  with  the  same  amount 
of  SI  to  ensure  that  without  hybridization,  no  cDNA  is  left  to  be 
amplified  in  PCR.  The  amount  of  SI  may  need  to  be  optimized 
for  each  target  gene  because  too-low  enzyme  activity  may  not 
be  sufficient  to  remove  all  single-stranded  cDNAs  and  may  thus 


cause  false  positivity,  whereas  too  much 
enzyme  is  known  to  have  weak  activity 
toward  double-stranded  DNA. 

In  the  MCF7  human  breast  cancer 
cell  line,  the  BCAS4  (breast  cancer 
amplified  sequence  4)  at  the  20ql3  of 
the  human  genome  forms  a  fusion  gene 
with  the  BCAS3  at  the  17q23,  which 
is  transcribed  and  alternatively  spliced 
to  different  chimeric  RNAs,  most  of 
which  contain  exon  1  of  BCAS4  and 
exons  24  and  23  of  BCAS3. 5,21,29  We 
performed  RT  with  random  hexamers 
and  then  PCR  with  primers  at  exon  1  of 
BCAS4  and  exon  25  of  BCAS3,  which 
resulted  in  a  dominant  band  at  about 
650  bp  and  several  minor  and  smaller 
bands  (Fig.  3B).  Pretreatment  of  the 
RNA  sample  with  DNase  I  followed  by 
inactivation  of  the  enzyme  (as  discussed 
later)  caused  partial  losses  of  the  minor 
bands,  which  redistributed  the  primers 
and  thus  increased  the  abundance  of  the 
dominant  band  (Fig.  3B).  An  aliquot 
of  the  cDNA  was  hybridized  with  an 
equivalent  amount  of  the  RNA  sample. 
Moreover,  the  RT  product  (1/20)  was 
also  used  in  PCR  to  amply  a  fragment  of 
CCND1,  which  was  purified  from  aga¬ 
rose  gel.  Half  of  the  purified  CCND1 
cDNA  was  added  into  the  hybridiza¬ 
tion  reaction  as  an  indicator  of  whether 
double-stranded  DNA  could  withstand 
the  hybridization  and  the  SI  digestion. 
After  hybridization,  SI  was  added  to 
digest  the  non-hybridized  RNAs  and 
cDNAs,  while  the  same  amount  of  the 
cDNA  as  used  in  hybridization  was  also 
Sl-treated  as  a  negative  control  (Fig.  3C).  After  inactivation  of  SI, 
PCR  with  the  BCAS  primers  yielded  a  band  from  the  Sl-treated 
cDNA/ RNA  hybrids,  but  not  from  the  non-hybridized  cDNA, 
as  expected  (Fig.  3C).  Similarly,  PCR  amplification  of  CCND1 
as  done  in  the  above  also  yielded  the  anticipated  band  from  the 
hybridized  cDNA  but  not  from  the  non-hybridized  counterpart 
(Fig.  3C).  Cloning  the  BCAS4-BCAS3  band  and  sequencing 
three  resultant  plasmid  clones  reveal  that  in  clones  1  and  2,  the 
canonical  3'  end  of  exon  1  of  BCAS4  was  fused  to  the  canonical 
5'  end  of  exon  24  of  BCAS3,  whereas  the  clone  3  lacked  the  last 
2  nt  (i.e.,  AG)  of  exon  1  of  BCAS4  and  the  whole  exon  24  of 
BCAS3  (Fig.  3C). 

RT  product  primed  by  endogenous  random  primers  in 
RNA  samples.  When  performing  RT,  we  often  set  up  a  reaction 
without  adding  primers  as  a  negative  control,  but  this  reaction 
still  and  always  yielded  cDNAs.  As  examples,  PCR  with  1/20 
(1  fxl)  of  such  non-primer  RT  product  as  the  template  could 
amplify  CCND1,  CDK4  or  HPRTl  cDNA  (Fig.  4A).  The  same 
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Figure  3.  cDNA  protection  assay.  (A)  In  this  strategy,  an  RNA  aliquot  is  converted  to  cDNA  in  RT  with  random  hexamers.  An  aliquot  of  the  cDNA  is 
hybridized  with  an  equivalent  amount  of  RNA,  followed  by  digestion  of  the  non-hybridized  cDNA  and  RNA  with  SI .  SI  is  inactivated  and  PCR  with 
gene-specific  primers  (F  and  R)  ensues  to  amplify  the  RNA-protected  cDNA.  As  a  negative  control,  an  equal  aliquot  of  cDNA  is  digested  with  SI  to 
ensure  that  without  hybridization,  no  cDNA  is  left  for  amplification  by  PCR.  (B)  RT  with  random  hexamers  and  with  RNA  sample  from  MCF7  cells  that 
was  not  treated  or  was  treated  with  DNase  I  followed  by  inactivation  of  the  enzyme.  PCR  with  BCS4F1+BCAS3R25  primers  detects  a  dominant  band  at 
about  650  bp  and  several  minor  and  smaller  bands.  (C)  An  equal  amount  of  RT  product  (cDNA)  was  hybridized  (FI)  or  non-hybridized  (N)  with  a  com¬ 
mensurate  amount  of  RNA,  followed  by  SI  digestion.  PCR  with  BCAS4F1+BCAS3R25  primers  detected  the  dominant  BCAS4-BCAS3  cDNA  (BCAS)  in 
hybridized,  but  not  in  non-hybridized,  RNA  aliquot.  As  a  control,  a  CCND1  PCR  fragment  was  added  into  the  hybridization  reaction  as  an  indicator  that 
double-stranded  DNA  could  withstand  the  hybridization  and  the  SI  digestion  as  it  can  be  amplified  by  PCR,  whereas  single-stranded  CCND1  cDNA 
in  non-hybridized  RT  product  was  digested  by  SI  and  thus  could  not  be  amplified  by  PCR.  Cloning  the  BCAS  band  and  sequencing  three  randomly 
selected  plasmid  clones  reveal  that  in  clones  1  and  2,  the  3'  end  of  exon  1  of  BCAS4  was  fused  to  the  5'  end  of  exon  24  of  BCAS3,  whereas  clone  3  lacks 
the  last  two  nt  (underlined  "ag"  shown  in  clones  1  and  2)  of  exon  1  of  BCAS4  and  the  whole  exon  24  of  BCAS4. 


CDK4  primers  did  not  produce  any  band  when  the  template  was 
replaced  by  water  (lanes  1  vs.  3  in  Fig.  4A),  confirming  that  the 
PCR  reagents  were  not  contaminated  by  cDNA  templates. 

In  our  routine  practice,  we  often  treated  RNA  samples  with 
low  concentration  (several  units)  of  DNase  I,  followed  by  inac¬ 
tivation  of  the  enzyme  with  different  methods  including  pro¬ 
tein  extraction  with  phenol/chloroform,  so  that  genomic  DNA 
residuals  would  not  be  mis-primed  in  the  ensuing  RT-PCR.41 
RNA  samples  from  HeLa,  PC-3  and  ZR75-1  cell  lines  that  were 
pre-treated  with  DNase  I  followed  by  inactivation  of  the  enzyme 
with  13  mM  EDTA  at  72°C  for  15  min  were  used  in  RT  with¬ 
out  adding  primer.  PCR  with  the  RT  product  as  the  template 
could  still  amplify  several  genes’  cDNA  (Fig.  4B).  Treatment  of 
RNA  samples  with  a  much  larger  amount  of  DNase  I  could  not 
eliminate  the  PCR-amplified  bands,  although  the  DNase  activity 
could  not  be  completely  inactivated  and  the  remaining  activity 
decreased  the  detected  level  (data  not  shown).  These  results  sug¬ 
gest  that  RNA  specimens  contain  endogenous  random  primers 
(ERP)  for  RT  that  cannot  be  removed  by  DNase  treatment. 

Antisense-caused  RT-PCR  artifacts.  We  infer  that  when  an 
antisense  is  expressed  and  overlaps  with  the  sense  RNA  at  their  5' 
or  3'  ends,  any  cloning  approaches  that  involve  RT-PCR,  includ¬ 
ing  our  method,  may  create  artifacts,  although  many  peers  still 
use  RT-PCR  in  cloning  under  this  situation.  We  used  the  CDK4/ 
TSPAN31  relationship  to  test  this  hypothesis,  since  the  CDK4 
mRNA  and  the  TSPAN31  mRNA  overlap  at  their  last  517  nt 
(Fig.  5A) .  We  designed  a  forward  primer  at  the  penultimate  exon 
of  CDK4  (CF933)  or  TSPAN31  (TF647);  one  set  of  these  two 
primers  also  contained  newD  sequence  at  the  5'  end  as  a  linker 


(NewDCF933  and  NewDTF647).  Other  primers  are  illustrated 
in  Figure  5A.  Forward  primer  of  one  strand  can  also  serve  as 
reverse  primer  of  the  opposite  strand.  UnDNased  RNA  from 
HeLa  cells  was  used  in  RT  with  the  NewDTF647  as  the  GSP, 
which  should  specifically  convert  the  CDK4  mRNA  to  cDNA 
(RT-B  in  Fig.  5B)  if  the  mRNA  reaches  the  TF647  region  as  we 
hypothesized.  PCR  using  this  RT-B  product  as  the  template  and 
the  CF136+R822  as  the  primer  pair  could  indeed  amplify  a  cor¬ 
rect  CDK4  band  as  expected  (lane  4  in  Fig.  5B).  PCR  with  NewD 
as  the  reverse  and  the  CF1096  as  the  forward  primers  resulted  in 
a  1.5  kb  band  (Lane  6  in  Fig.  5B).  Cloning  and  sequencing  this 
band  confirmed  that  it  was  part  of  CDK4  mRNA  containing  the 
84-bp  intron  5  of  TSPAN31.  These  results  together  indicate  that 
some  CDK4  mRNAs  reach  at  least  the  TF647  region,  making 
the  overlapped  region  much  longer  than  what  is  shown  in  the 
NCBI  (thick  arrow  in  Fig.  5A). 

Similarly,  RT  primed  by  the  NewDCF933  should  specifically 
convert  the  TSPAN31  mRNA  to  cDNA  if  the  mRNA  reaches  the 
CF933  region  (RT-A  in  Fig.  5B).  PCR  using  this  RT-A  product 
as  the  template  and  the  TF73+TR860  as  the  primer  pair  yielded 
the  correct  TSPAN31  band  (the  top  band  in  lane  1  in  Fig.  5B). 
However,  a  smaller  band  was  also  produced  that  was  confirmed 
by  T-A  cloning  and  sequencing  to  be  the  LMTK2  mRNA  from 
chromosome  7,  but  not  the  TSPAN31.  Alignment  of  the  LMTK2 
and  CDK4  sequences  suggests  that  the  TMTK2  cDNA  is  more 
likely  to  be  primed  by  an  endogenous  primer  in  the  RNA  sample 
as  described  above,  but  not  by  the  NewDCF933,  indicating  that 
RT  using  GSP  is  not  so  gene-  and  strand-specific  as  it  is  supposed 
to  be. 
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Figure  4.  RT  primed  by  endogenous  random  primers  (ERP).  (A)  UnDNased  RNA 
sample  from  HeLa  cells  was  used  in  RT  without  adding  primers.  The  RT  product 
(1/20)  was  used  as  the  template  in  PCR  with  the  F70+R1067  for  CCND1,  F136+R822 
for  CDK4  and  F123+R683  for  FIPRT1,  respectively.  As  a  negative  control,  the  same 
CDK4  primers  were  used  in  a  PCR  with  Fl20  to  replace  the  RT  product  as  the 
template  (lanes  1  vs.  3).  (B)  RNA  samples  from  Hela,  PC-3  and  ZR75-1  cell  lines 
were  treated  with  DNase  I,  followed  by  inactivation  of  the  enzyme.  RT  was  then 
performed  without  adding  primers.  The  RT  product  was  used  as  the  template 
in  PCR  amplification  of  CCND1,  CDK4  and  HPRT1  as  in  (A)  or  of  c-myc  with  the 
F125+R838  primers.  In  an  addition,  CCND1  PCR,  the  RT  product,  was  superseded 
by  H20  as  the  template. 


PCR  using  the  RT-A  product  as  the  template  and  the 
NewD+TF647  as  the  primer  pair  did  not  yield  any  band,  which 
was  discrepant  to  the  results  in  lane  1  (lanes  1  vs.  5  in  Fig.  5B). 
More  surprisingly,  PCR  using  TF73+TR860  as  the  primers  and 
RT-B  as  the  template  yielded  the  same  two  bands  as  when  RT-A 
was  used  as  the  template  (Lanes  1  vs.  3  in  Fig.  5B),  although  the 
RT-B  primed  by  NewDTF647  was  not  supposed  to  convert  the 
3'  part  of  TSPAN31,  or  any  RNA  from  this  region,  to  cDNA. 
Similarly,  PCR  using  CF136+R822  as  the  primers  and  RT-A  as 
the  template  generated  the  same  band  as  when  RT-B  was  used 
as  the  template  (Lanes  2  vs.  4  in  Fig.  5B),  although  the  RT-A 
primed  by  NewDCF933  was  not  supposed  to  convert  the  5'  part 
of  CDK4,  or  any  RNA  from  this  region,  to  cDNA.  A  reasonable 
explanation  for  these  inconsistent  results  is  that  some  CDK4  and 
TSPAN31  mRNAs  have  an  unprotected  3 '-end  at  the  overlapped 
region,  serving  as  the  primer  to  extend  its  cDNA  as  illustrated  in 
Figure  5C.  This  extension  may  happen  in  RT  or  in  the  ensuring 
PCR  as  discussed  later.  In  other  words,  the  bands  in  lanes  2  and  3 
in  Figure  5B  do  not  have  the  corresponding  RNA  as  the  original 
template  and  thus  are  wrong-template  artifacts.  The  TSPAN31 
band  (the  top  one)  in  lane  1  of  Figure  5B  might  be  such  wrong- 
template  artifact  as  well,  which  explains  why  NewD+TF647 
failed  to  yield  a  PCR  product  (lane  5  in  Fig.  5B).  In  line  with 
this  conjecture,  a  PCR  with  the  RT-A  product  as  the  template 
and  the  CF933+TR1668  as  the  primer  pair  did  not  produce  any 
band  (data  not  shown) . 

Discussion 

Features  of  our  cloning  methods.  Routine  5'  or  3'RACE  usu¬ 
ally  can  only  clone  short  cDNA  fragments,  sometimes  making 
it  unclear  whether  the  cloned  cDNA  end  belongs  to  a  chime¬ 
ric  RNA  or  to  an  mRNA  of  the  parent  gene  (Fig.  1A),  in  part 


because  the  first  and  last  exons  are  often  very  large. 
Moreover,  5 'RACE  is  difficult  as  its  first  several  steps 
are  manipulations  of  fragile  RNA.  Our  cloning  meth¬ 
ods  start  with  RT  and  soon  proceed  with  the  synthesis 
of  the  second  cDNA  strand,  with  all  later  steps  involv¬ 
ing  only  double-stranded  cDNA  that  is  much  more 
stable.  Since  almost  the  entire  second  strand  can  be  syn¬ 
thesized,  either  one  of  our  cloning  methods  can  clone 
virtually  the  entire  cDNA.  A  difference  is  that  our  5' 
cloning  method  involves  PCR  amplification  and,  thus,  is 
more  efficient,  whereas  our  3'  cloning  method  is  a  non- 
amplified  approach  with  low  efficiency  but  high  fidelity. 
In  addition,  our  3'  method  does  not  require  a  poly-dT 
primer,  allowing  cloning  those  RNAs  without  a  poly-A 
tail  and  eliminating  mis-priming  to  an  internal  poly-A 
sequence.  Actually,  we  once  used  3'RACE  to  clone  the 
3'  end  of  TSPAN31  with  a  poly-dT  primer  that  contains 
a  linker  (coined  as  NewA;  Table  1),  followed  by  PCR 
with  the  linker  sequence  as  a  primer  (coined  as  NewC; 
Table  1).  The  3'  end  cloned  lacks  the  last  17-nt  sequence 
(GAC  CAT  TAA  AAA  AAA  AA)  because  there  is  a 
14-adenine  sequence  in  front  of  it  (data  not  shown). 
If  needed,  however,  our  method  can  still  use  poly-dT 
primers  in  RT,  alone  or  in  combination  with  PCR,  to  enhance 
the  cloning  efficiency. 

In  our  practice  of  molecular  cloning,  poly-dT  primer  is  often 
used  to  prime  poly-A  tail  in  RT,  whereas  a  poly-dG  oligo  longer 
than  a  hexamer  (GGG  GGG)  is  technically  difficult  to  be  syn¬ 
thesized,  purified  and  verified  and,  thus,  is  much  more  expensive. 
Therefore,  tailing  a  cDNA  with  poly-dG  followed  by  priming  it 
with  a  poly-dC  oligo  becomes  the  only  practical  choice  for  our 
5'  end  cloning  method.  The  length  of  the  poly-dG  tail  may  be 
different  among  tailed  targets,  but  one  of  the  four  oligos  in  the 
NewB  mixture  (Table  1)  should  be  anchored  on  the  last  nt  of  the 
targeted  cDNA,  regardless  of  the  length  of  the  poly-dG  tail  and 
whether  the  3'  end  of  the  first  cDNA  strand  has  been  added  with 
several  nt  by  the  MMLV  during  the  RT.4,18  Fiowever,  it  remains 
possible  that  the  NewB  mis-primes  an  internal  poly-G  sequence, 
which  has  actually  happened  in  our  practice. 

Similar  to  routine  5'  and  3'  RACEs,  our  methods  only  use  a 
single  GSP  and,  thus,  may  still  cause  unspecific  bands  as  shown 
in  Figure  2.  Moreover,  cDNA  may  have  breakages  and,  similar 
to  routine  RACEs,  our  methods  cannot  distinguish  a  genuine 
cDNA  end  from  a  spurious  one.  In  addition,  transcription  may 
be  initiated  from  or  terminated  at  alternative  sites.  Therefore, 
cloning  multiple  bands  and  sequencing  multiple  plasmid  clones 
are  strongly  recommended,  not  only  to  avoid  artifacts  but  also  to 
increase  the  chance  of  identifying  alternative  5'  or  3'  ends. 

Merits  of  cDNA  protection  assay.  The  strategy  to  protect 
a  cDNA  instead  of  the  parental  RNA  has  four  major  merits: 
(1)  After  being  protected  by  the  parental  RNA,  the  cDNA  can 
be  PCR-amplified,  which  dramatically  increases  the  sensitiv¬ 
ity.  If  part  of  the  cDNA  is  an  RT  artifact,  it  will  not  be  pro¬ 
tected  because  the  single-stranded  part  of  the  cDNA  or  of  the 
parental  RNA  will  be  digested  by  SI.  Single-stranded  DNA  is 
about  5 -fold  more  sensitive  to  SI  than  RNA,  as  stated  in  the 
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Figure  5.  RT  with  linker-containing  GSP  to  detect  CDK4  (NM_000075.3)  and  TSPAN31  (NM_005981.3).  (A)  Illustration  of  the  CDK4/TSPAN31  relationship 
according  to  the  NCBI,  with  the  locations  of  CDK4  forward  (CF)  or  reverse  (CR)  primers  and  TSPAN31  forward  (TF)  or  reverse  (TR)  primers  indicated. 
Boxes  represent  exons  with  their  length  indicated  as  the  number  of  nt.  Note  that  the  two  mRNAs  overlap  at  their  last  517  nt.  (B)  RT  of  unDNased  RNA 
from  Hela  cells  primed  by  the  NewDCF933  should  specifically  convert  the  TSPAN31  mRNA  to  cDNA  (RT-A),  whereas  RT  primed  by  the  NewDTF647 
should  specifically  convert  the  CDK4  mRNA  to  cDNA  (RT-B),  if  the  mRNAs  reach  the  regions  of  these  primers.  These  two  RT  products  were  used  as  the 
template  in  PCR  with  either  the  TF73+TR860  (PCR-a,  lanes  1  and  3)  or  the  CF136+CR822  (PCR-b,  lanes  2  and  4)  as  the  primer  pair  or  in  PCR  with  the 
primer  pair  of  NewD+TF647  (PCR-c,  lane  5)  or  NewD+CR1096  (PCR-d,  lane  6).  The  results  in  lanes  4  and  6  together  with  sequence  data  suggest  that 
some  CDK4  transcripts  reach  the  TF647  region  (thick  black  arrow  in  A).  (C)  When  sense  or  antisense  RNA  has  an  unprotected  3'  end  overlapping  with 
the  other,  the  overlapping  sequence  may  serve  as  a  primer  in  RT  to  extend  its  3'  end  with  its  antisense  as  the  template.  The  extension  may  also  occur 
in  the  ensuing  PCR,  as  depicted  in  Figure  6A.  Because  the  mRNA  does  not  really  have  this  extended  part  (dashed  lines),  the  corresponding  PCR  prod¬ 
uct,  such  as  the  bands  in  lanes  1,  2  and  3  in  (B),  is  a  wrong-template  artifact. 


supplier’s  datasheet  of  SI  nuclease.  (2)  The  protected  cDNA  can 
be  directly  cloned  and  sequenced  to  confirm  its  identity,  whereas 
in  RNA  protection  assay,  the  protected  RNA  still  needs  to  be 
converted  to  cDNA  if  a  long  fragment  needs  to  be  sequenced 
at  a  high  quality.  (3)  It  is  still  technically  difficult  to  deter¬ 
mine  from  which  DNA  strand  an  RNA  is  transcribed.  Use  of 
strand-specific  DNA  oligos  to  supersede  cDNA  in  our  protec¬ 
tion  assay  may  be  the  best  way  for  this  purpose,  as  discussed 
later.  (4)  DNA/ RNA  hybrid  has  its  unique  structure  and  com¬ 
positions  that  are  distinguishable  from  DNA/DNA  or  RNA/ 
RNA  hybrid,40  in  part  because  DNA/DNA  contains  dA  and 
dT,  RNA/RNA  contains  rA  and  rU,  while  DNA/RNA  con¬ 
tains  all  four.  These  differences  should  provide  us  with  unique 
strategies  to  develop  sensitive  methods  and  instruments  for  the 
detection  and  quantification  of  those  DNA/RNA  hybrids  that 
are  at  very  low  abundance.  Such  strategies  should  be  applicable 
and,  thus,  intriguing,  as  endogenous  DNA/RNA  hybrids  in 
eukaryotic  cells  are  many  fewer  than  the  DNA/DNA  and  RNA/ 
RNA  hybrids,  especially  when  a  larger  DNA/RNA  fragment  is 
designed  for  protection. 

In  most  assays  the  probe  is  used  at  great  excess  compared  with 
the  target.  We  suggest  that  if  our  method  is  used  mainly  to  verify 
the  true  existence  of  an  RNA  transcript,  the  RNA  sample  should 
be  considered  as  the  probe  and,  thus,  used  in  great  excess,  rela¬ 
tive  to  the  cDNA.  Conversely,  if  the  aim  is  to  quantify  the  RNA 
expression  level,  the  cDNA  should  be  regarded  as  the  probe  and 
used  in  great  excess.  A  set  of  nested  PCR,  including  those  with 
one  primer  in  the  Sl-digested  region,  should  help  in  authenticat¬ 
ing  the  RNA  and,  thus,  is  highly  recommended,  especially  when 
T-A  cloning  and  sequencing  the  resultant  plasmids  are  omitted 
due  to  whatever  considerations. 


Unvanquished  obstacles  set  by  ERP  in  RT.  Although  retro¬ 
virus  uses  cellular  tRNA  to  prime  mRNA  for  reverse  transcrip¬ 
tases  to  synthesize  the  1st  DNA  strand,31,32  endogenous  small 
RNAs  such  as  mRNA  fragments  can  efficiently  prime  cDNA 
synthesis  by  reverse  transcriptases.12,20,22  RNA  samples  contain  a 
huge  number  of  short  RNA  fragments,  such  as  degraded  RNAs, 
excised  introns  and  other  processed  mRNAs  that  are  known  to 
us  recently,1  which  can  serve  as  ERP  for  RT.  This  is  likely  the 
reason  why  RT  can  occur  without  addition  of  primers,  a  phe¬ 
nomenon  coined  by  others  as  “background  priming.”2,15  This  also 
explains  why  DNase  treatment  of  RNA  samples  cannot  eliminate 
cDNA  generation  in  the  ensuing  RT.  Actually,  during  DNase 
treatment  and  inactivation,  some  RNAs  are  likely  degraded  to  be 
ERP.  Besides,  short  genomic  and  mitochondrial  DNA  fragments 
resulting  from  degradation  or  incomplete  DNase  digestion  can 
also  serve  as  ERP. 

The  presence  of  ERP  should  not  affect  the  RT  results  from 
random  hexamers,  and  may  not  affect  the  results  from  poly- 
dT  primer  either  if  polyadenylation  is  not  a  specific  concern. 
However,  the  results  from  GSP  may  no  longer  be  gene-  and 
strand-specific,  not  only  because  GSP  may  mis-prime,  which  is 
familiar  to  us,  but  also  because  ERP  can  prime  other  RNAs, 
including  the  antisense  of  the  interested  RNA  if  it  exists.  The 
gene-specificity  may  be  improved  by  adding  a  linker  sequence, 
herein  NewD,  to  the  3'  end  of  the  GSP  and  using  it  as  one  primer 
in  the  ensuing  PCR.  The  strand-specificity  may  also  be  improved 
in  this  way  if  the  antisense  RNA  level  is  relatively  low  and  the 
amount  of  Linker-GSP  is  carefully  managed,  as  shown  in  lanes 
5  vs.  6  in  Figure  5B  and  as  depicted  in  Figure  6A.  However, 
this  strategy  may  not  always  work  well.  When  we  tested  this 
strategy  by  determining  the  existence  of  LOC100996515  RNA 
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Figure  6.  Depiction  of  artifacts  caused  by  ERP  or  by  5'  or  3'  overlapping.  (A)  Because  the  RNA  sample  contains  ERP,  RT  with  Linker-GSP  will  also  gener¬ 
ate  the  first  cDNA  strand  of  the  antisense  RNA,  besides  the  cDNA  of  the  desired  sense  RNA.  When  the  RT  product  (usually  only  1  julI)  is  added  into  the 
PCR  mixture  as  the  template,  some  Linker-GSP  residual  is  transferred  together,  which  primes  the  synthesis  of  a  linker  containing  antisense  fragment. 
The  fragment  is  amplified  in  the  later  PCR  cycles.  However,  because  the  PCR  mixture  contains  many  more  copies  of  the  GSP  and  the  gene-specific  re¬ 
verse  primer  (GSRP)  than  the  Linker-GSP  residual,  the  first  PCR  cycle  should  generate  many  more  copies  of  the  desired  sense  cDNA,  which  titrates  out 
the  antisense  in  later  PCR  cycles,  unless  the  antisense  RNA  is  expressed  at  a  much  higher  level  than  the  sense.  We  use  as  small  an  amount  of  linker-GSP 
as  possible  in  the  RT  to  minimize  its  residual  in  the  RT  product.  (B)  Two  cDNAs  that  overlap  at  their  5'  ends,  no  matter  whether  they  are  unrelated  or  are 
originated  respectively  from  a  sense  and  an  antisense  transcripts  that  overlap  at  their  3'  ends  (like  CDK4  and  TSPAN31),  can  be  converted  to  3'-over- 
lapped  counterparts  after  one  round  of  PCR  by  GSP  or  ERP,  which,  in  turn,  creates  wrong-template  extension  in  later  PCR  cycles  as  depicted  in  Figure 
5C.  (C)  If  a  cDNA  has  an  unprotected  3'  end  that  is  reverse-complementary  to  an  unrelated  cDNA  (in  red  color),  this  matched  part  (e.g.,  ATCGA/TAGCT) 
and  this  other  cDNA  may  serve  in  PCR  as  the  primer  and  the  template,  respectively,  to  create  a  spurious  chimera. 


with  primers  at  the  region  overlapped  by  CCND1,  CCND1  as 
its  antisense  was  often  detected  because  of  its  much  higher  abun¬ 
dance.  A  better  way  to  ensure  the  strand-specificity  may  be  to  use 
strand-specific,  probably  labeled,  DNA  oligos  to  replace  cDNA 
probes  in  hybridization  with  the  RNA  of  interest.  Such  strand- 
specific  DNA  oligos  can  be  in  vitro  synthesized  like  a  primer 
or  made  by  other  ways,10  including  PCR  with  one  biotinylated 
primer  followed  by  capture  with  streptavidin-coated  magnetic 
beads  or  PCR  with  one  5'phosphorylated  primer10  followed  by 
digestion  of  the  useless,  5'phosphorylated  strand  using  lambda 
exonuclease.10,34 

Although  GSP  has  been  widely  used  in  RT-PCR  for  decades, 
to  our  knowledge  none  of  the  published  studies  has  addressed  the 
possible  spuriousness  and  provided  a  corrective  measure  as  did  we 
herein.  Since  routine  GSP-primed  RT  is,  likely,  neither  gene-  nor 
strand-specific,  whether  those  published  data  need  to  be  reevalu¬ 
ated  or  reinterpreted  becomes  an  uncomfortable  but  unavoidable 
question  that  peers  need  to  bear  in  mind,  in  our  humble  opinion. 


Unsolved,  overlap-caused  artifacts  of  cDNA  end  and  chime¬ 
ras.  The  NCBI  has  updated  several  times  the  sequences  of  CDK4 
and  TSPAN31  and  keeps  extending  the  overlapped  region.  One  of 
the  CDK4  sequences  we  obtained  is  longer  than  the  latest  NCBI 
version.  We  surmise  that  all  old  and  new  sequences  may  be  cor¬ 
rect,  representing  different  variants  with  different  lengths  of  the 
overlap.  However,  cloning  the  3'  end  of  each  of  these  mRNAs  is 
technically  difficult  due  to  two  major  reasons:  (1)  ERP  will  result 
in  cDNA  of  the  antisense  in  RT  (Fig.  6A) .  (2)  One  of  the  mRNA 
molecules,  either  CDK4  or  TSPAN31,  may  have  an  unprotected 
3'  end  at  the  overlapped  region,  due  to  reasons  such  as  degra¬ 
dation  (breakage),  premature  transcription,  deadenylation,  early 
termination  of  RT,  etc.,  occurring  either  as  a  physiological  event 
or  as  an  artifact.  This  unprotected  3'  end  will  serve  as  a  primer  to 
extend  its  cDNA  with  an  antisense  RNA  as  the  template  in  RT, 
creating  a  wrong-template  artifact  (Fig.  5C).  In  this  situation, 
the  3'  end  cloned  by  any  RT-PCR  involved  approach,  including 
our  method,  could  be  an  artifact.  This  artifact  may  also  occur  in 
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PCR,  if  not  already  in  RT,  because  one  round  of  PCR,  primed  by 
either  GSP  or  ERP,  will  convert  two  5 '-overlapped  cDNAs  to  two 
3'-overlapped  ones  (Fig.  6B).  This  pitfall  should  be  particularly 
alerted  to  the  biomedical  society  because  so  often  RT-PCR  is  used 
to  clone  RNA  without  preclusion  of  the  existence  of  overlapped 
antisense  RNA.  So  far  we  are  still  unable  to  get  out  from  this  trap 
and,  thus,  unable  to  clone  the  genuine  3 '-end  of  the  extended 
CDK4  mRNA  shown  in  lane  6  of  Figure  5B,  and  to  determine 
whether  TSPAN31  also  has  mRNA  variant(s)  extended  beyond 
the  latest  NCBI  version. 

Our  results  also  alert  us  to  anther  pitfall  that  if  routine  or 
quantitative  RT-PCR  is  used  to  determine  the  expression  level 
of  an  RNA  that  is  accompanied  by  an  overlapped  antisense  tran¬ 
script,  PCR  with  primers  at  the  overlapped  region  starts  with 
two  templates  and,  thus,  will  falsely  double  the  expression  level. 
Therefore,  the  locations  of  the  primers  matter,  and  primers  at 
different  regions  of  the  RNA  should  be  used.  Since  over  63% 
of  RNA  transcripts  may  be  accompanied  by  antisense  counter¬ 
parts,24  peers  should  be  alerted  to  the  pitfalls  described  above. 

A  huge  number  of  putative  chimeric  RNAs  encompass  a 
short  homologous  sequence  shared  by  the  two  partners.27  The 
reason  is  unknown  but  it  has  led  to  discussions  on  how  such 
chimeras  are  formed.3,27,39  Our  observation  of  wrong-template 
extension  created  by  overlap  at  the  3'  or  3'  end  enlightens  us 
in  that  some,  likely  many,  of  this  type  of  chimeras  may  simply 
be  RT-PCR  spuriousness:  If,  as  described  above,  an  RNA  or  a 
cDNA  has  an  unprotected  3'  end  that  is  reversely  complemen¬ 
tary  to  an  unrelated  (i.e.,  not  its  antisense)  RNA  or  cDNA,  a 
chimeric  sequence  may  be  generated  in  RT  or  PCR  (Fig.  6C). 
Since  a  pentamer  can  prime  RT  or  PCR  efficiently,  the  homolo¬ 
gous  part  can  be  a  5-nt  sequence,  although  whether  a  shorter 
oligo  still  has  some  priming  ability  is  not  so  clear.  Because  the 
RNA  repository  in  any  human  cell  contains  numerous  such 
short  homologous  sequences,  we  tend  to  believe  that  many  of 
those  chimeras  containing  a  short  homologous  sequence  and 
obtained  by  approaches  that  involve  RT,  PCR  or  similar  meth¬ 
ods  are  such  technical  artifacts. 

Materials  and  Methods 

RNA  preparation,  DNase  I  treatment  and  RT.  Total  RNA  was 
extracted  from  indicated  cell  lines  using  TRIzol  (Invitrogen, 
Cat.  15596-026).  In  some  experiments,  RNA  was  treated 
with  DNase  I  (1-3  units)  to  remove  genomic  DNA  residuals, 
followed  by  inactivation  of  the  DNase  with  15  mM  EDTA  at 
72°C  for  15  min.  An  aliquot  (4-5  |Jtg)  of  total  RNA  was  reverse- 
transcribed  to  the  first  strand  of  cDNA  with  indicated  primers 
and  M-MLV  Reverse  Transcriptase  (Promega,  Cat  #.  M1705; 
www.promega.com),  following  the  manufacturer’s  instruction, 
but  in  a  20-25  pi  volume. 

Primer  nomenclature.  We  used  “F”  and  “R”  to  indicate  a 
forward  and  a  reverse  primer,  respectively.  Each  primer’s  name 
ends  with  a  number  that  indicates  the  first  (for  F)  or  the  last  (for 
R)  nt  of  that  primer  in  the  position,  i.e.,  the  distance  from  the 
first  nt,  of  the  mRNA.  Thus,  the  F-to-R  range  is  the  size  of  an 
RT-PCR  amplified  DNA  fragment  in  agarose  gel.  All  primers  are 


listed  in  Table  1.  More  details  of  the  primer  design  principle  were 
described  before.41 

Purification  of  DNA  and  T-A  cloning.  PCR-amplified 
cDNA  fragment  was  fractioned  in  1%  agarose  gel  and  visual¬ 
ized  by  ethidium  bromide  staining.  The  desired  band  was  then 
excised  out  and  purified  with  UltraClean  Gel  DNA  Extraction 
Kit  (ISC  BioExpress;  www.bioexpress.com)  following  the  man¬ 
ual,  or  with  a  simple  method  we  described  before.42  The  puri¬ 
fied  DNA  was  ligated  into  a  pGEM-T  Easy  Vector  (Promega; 
www.promega.com) . 

RNA  3'  end  cloning.  RT  was  performed  using  RNA  from 
Hela  cells  and  primed  by  random  hexamers,  with  other  condi¬ 
tions  as  described  above.  The  3'  part  of  the  second  strand  of 
CDK4  cDNA  was  synthesized  using  1/3  to  1/2  of  the  RT  prod¬ 
ucts, 100  nM  CDK4  F665  primer  and  lx  PCR  Mastermix,  with 
one  cycle  of  95°C  for  5  min,  60°C  for  2  min  and  72°C  for  15  min 
in  a  thermocycler.  Ten  or  15  units  of  SI  nuclease  (Cat  #  18001- 
016;  www.invitrogen.com)  was  added,  followed  by  incubation  at 
room  temperature  for  60  min  to  digest  the  3'  overhang  of  the  first 
cDNA  strand  and  all  single  stranded  cDNAs  or  mRNAs.  EDTA 
was  added  to  a  final  concentration  of  10-15  mM  with  incuba¬ 
tion  at  72°C  for  15  min  to  inactivate  SI.  To  remove  EDTA,  the 
reaction  was  transferred  to  an  Eppendorf  tube  with  additions  of 
0.35  ml  H20  and  1.2  ml  95%  ethanol,  followed  by  precipitation 
at  -20°C  for  20  min  and  then  centrifugation  at  13,000  rpm  at 
4°C  for  15  min.  The  ethanol  was  discarded  and  the  cDNA  pel¬ 
let  was  suspended  with  14  pi  H20.  To  ensure  that  the  3'  over¬ 
hang  of  the  first  cDNA  strand  had  been  removed  by  SI  but  the 
double-stranded  fragment  was  protected,  2  pi  of  the  recovered 
double-stranded  cDNA  was  used  as  the  template  to  run  40  cycles 
of  PCR  with  the  F136+R1086  or  the  F665+R1086  primer  pair 
(Fig.  1).  The  remaining  (10  pi)  double-stranded  cDNA  was  then 
added  with  10  p,l  PCR  Mastermix,  followed  by  incubation  at 
72°C  for  10  min  to  append  a  dA  at  the  cDNA  blunt  ends.  A  por¬ 
tion  (herein  6  pi)  of  the  dA- appended  cDNA  was  cloned  into  a 
T-A  vector.  The  resultant  plasmid  clones  were  first  confirmed  by 
PCR  with  the  F665+R1086  primers  and  then  sequenced  with  a 
vector  primer. 

RNA  5'  cloning  with  G-tailing.  RT  was  performed  using 
RNA  from  Hela  cells  as  above-described,  but  with  HPRT1R683 
as  a  gene  specific  reverse  primer.  After  being  run  through 
RapidTip2  (cat  #  RT050-096;  www.midsci.com)  to  remove 
primers,  enzymes,  dNTP  and  debris,  an  aliquot  (1/4)  of  the  RT 
product  was  transferred  into  a  500-jjlI  tube  with  additions  of 
30  units  of  TdT  (www.promega.com;  Cat#  M828C),  two  units 
of  RNase  H,  4  mM  dGTP,  and  2  mM  MnCl2  in  a  final  volume  of 
25  pi,  followed  by  incubation  at  37°C  for  30-60  min  to  synthe¬ 
size  a  poly-dG  tail.  The  TdT  was  inactivated  by  heating  to  72°C 
for  10  min.  About  half  of  the  dG-tailed  product  was  primed  by  a 
NewB  mixture  (Table  1),  with  lx  PCR  Mastermix  in  a  20-pi  vol¬ 
ume,  to  synthesize  the  second  cDNA  strand  by  one  cycle  of  95°C 
for  5  min,  60°C  for  2  min  and  72°C  for  15  min  in  a  thermocy¬ 
cler.  About  1/4  of  the  double-stranded  HPRTl  cDNA  was  then 
used  as  the  template  to  run  PCR  with  the  NewD+HPRTlR683 
primer  pair  for  40  cycles  of  95°C  for  30  sec,  60°C  for  30  sec  and 
7 2°C  for  60  sec.  The  PCR  product  appeared  as  a  fuzzy  band  in 
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agarose  gel  and,  thus,  was  excised  out  and  purified  as  the  tem¬ 
plate  for  a  second  round  of  PCR,  followed  by  excision  and  purifi¬ 
cation  of  the  dominant  band  for  T-A  cloning  (Fig.  2) . 

cDNA  protection  assay.  RT  was  performed  using  RNA  from 
MCF7  cells  in  a  25-fxl  volume  and  primed  with  random  hexam- 
ers.  The  RT  product  was  incubated  at  72°C  for  15  min  with  about 
10-15  mM  EDTA  to  inactivate  RNase  H  and  DNA  polymerase 
activities  of  the  MMLV.  To  remove  EDTA,  the  RT  product  was 
transferred  to  an  Eppendorf  tube  with  additions  of  0.35  ml  H20 
and  1.2  ml  95%  ethanol,  followed  by  precipitation  of  the  cDNA 
as  described  above.  The  cDNA  was  suspended  in  20  pi  of  H20. 
In  a  500-jjlI  tube,  the  hybridization  was  set  up  with  1/10-1/5 
(2-4  jjlI)  of  the  cDNA  and  an  equivalent  amount  of  the  RNA 
sample  in  a  50- pi  solution  containing  25%  formamide  (v/v),  600 
mM  NaCl,  30  mM  Tris-HC  (pH  7.5),  0.1%  SDS,  10  mM  DTT 
and  4  mM  EDTA,  as  described  before.28  Moreover,  1  p,l  (1/20)  of 
the  cDNA  was  also  used  in  PCR  to  amplify  the  CCND1  cDNA 
with  the  F70+R1067  primer  pair,  and  the  PCR  product  was  puri¬ 
fied  from  agarose  gel.  About  half  of  the  purified  CCND1  PCR 
product  was  added  into  the  hybridization  reaction  as  an  indica¬ 
tor  of  whether  the  hybridization  and  the  ensuing  SI  digestion 
degrade  double-stranded  DNA.  After  topping  with  35  pi  mineral 
oil  (purchased  from  a  Walmart  store;  product  #831432DB1)  to 
prevent  evaporation,  the  hybridization  reaction  was  performed  at 
68°C  for  8  h  or  longer.  After  transfer  to  an  Eppendorf  tube,  the 
reaction  was  diluted  and  precipitated  with  additions  of  0.35  ml 
H20  and  1.2  ml  95%  ethanol  as  described  above.  The  hybrids 
were  suspended  in  18  pi  H20  and  divided  to  three  aliquots 
for  digestion  with  0,  10  or  15  units  of  SI  in  a  final  volume  of 
20  pi  at  room  temperature  for  60  min.  As  a  negative  control 
(Fig.  2),  a  separate  SI  digestion  was  set  up  with  equal  amounts 
of  cDNA  (the  RT  product)  and  SI.  The  SI  was  then  inactivated 
with  10—15  mM  EDTA  at  72°C  for  15  min.  The  EDTA  was 
removed  by  dilution  and  precipitation  with  additions  of  0.35  ml 
H20  and  1.2  ml  95%  ethanol  at  -20°C  as  described  above.  The 
cDNA/RNA  hybrids  were  suspended  in  16  pi  H20,  3  pi  of 
which  was  used  to  run  PCR  with  the  BCAS4ElF+hBCAS3E25R 
primers  and  the  CCND1F183+R1067  primers  to  ensure  that  the 
BCAS4-BCAS3  and  the  CCND1  cDNAs  had  been  protected. 
The  BCAS4-BCAS3  band  was  then  purified  from  agarose  gel 
and  cloned  into  a  T-A  vector  for  sequencing  verification. 


Summary 

We  describe  two  new  methods  for  cloning  cDNA  ends  and  a 
cDNA  protection  assay  to  supersede  RNA  protection  assay.  We 
also  report  that  GSP-primed  RT  product  is  neither  gene-  nor 
strand-specific  because  the  RNA  sample  contains  ERP.  The 
gene-specificity  may  be  improved  by  adding  a  linker  sequence 
to  the  GSP  and  then  using  the  linker  as  a  primer  in  the  ensu¬ 
ing  PCR,  whereas  the  strand-specificity  may  be  improved  by 
using  strand-specific  DNA  oligos  as  the  probe  in  our  protection 
assay.  Using  the  CDK4/TSPAN31  relationship  as  a  model,  we 
find  that  when  sense  and  antisense  RNAs  overlap  at  their  3' 
ends,  the  overlapped  sequence  might  serve  as  a  primer  with  its 
antisense  as  the  template  to  create  a  wrong-template  extension 
in  RT  or  PCR,  resulting  in  a  spurious  3'  end.  This  result  edi¬ 
fies  us  that  two  unrelated  RNAs  or  cDNAs  that  overlap  at  the 
5'-  or  3'-end  may  also  create  a  chimeric  sequence  in  this  way. 
Therefore,  many  chimeric  RNAs  containing  a  short  homolo¬ 
gous  sequence  and  obtained  by  approaches  involving  RT  or 
PCR  may  be  such  artifacts  and,  thus,  need  to  be  vigorously 
verified  with,  such  as,  our  protection  assay.  The  ERP  and  the 
5  -  or  3 '-overlapping  antisense  together  set  more  complex  pit- 
falls  in  our  way  of  RNA  cloning,  which  should  be  highly  alerted 
to  the  peers.  Our  methods  cannot  fully  circumvent  these  traps 
but  should  be  good  alternative  or  corrective  measures  to  the 
available  ones  for  cloning  chimeric  or  antisense-accompanied 
RNA,  both  together  constituting  the  majority  of  the  cellular 
RNA  repository. 
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Abstract 

The  genes  encoding  (3-actin  (ACTB  in  human  or  Actb  in  mouse)  and  glyceraldehyde-3-phosphate  dehydrogenase  (GAPDH  in 
human  or  Gapdh  in  mouse)  are  the  two  most  commonly  used  references  for  sample  normalization  in  determination  of  the 
mRNA  level  of  interested  genes  by  reverse  transcription  (RT)  and  ensuing  polymerase  chain  reactions  (PCR).  In  this  study, 
bioinformatic  analyses  revealed  that  the  ACTB,  Actb,  GAPDH  and  Gapdh  had  64,  69,  67  and  197  pseudogenes  (PGs), 
respectively,  in  the  corresponding  genome.  Most  of  these  PGs  are  intronless  and  similar  in  size  to  the  authentic  mRNA. 
Alignment  of  several  PGs  of  these  genes  with  the  corresponding  mRNA  reveals  that  they  are  highly  homologous.  In 
contrast,  the  hypoxanthine  phosphoribosyltransferase-1  gene  (HPRT1  in  human  or  Hprt  in  mouse)  only  had  3  or  1  PG, 
respectively,  and  the  mRNA  has  unique  regions  for  primer  design.  PCR  with  cDNA  or  genomic  DNA  (gDNA)  as  templates 
revealed  that  our  HPRT1,  Hprt  and  GAPDH  primers  were  specific,  whereas  our  ACTB  and  Actb  primers  were  not  specific 
enough  both  vertically  (within  the  cDNA)  and  horizontally  (compared  cDNA  with  gDNA).  No  primers  could  be  designed  for 
the  Gapdh  that  would  not  mis-prime  PGs.  Since  most  of  the  genome  is  transcribed,  we  suggest  to  peers  to  forgo  ACTB 
(Actb)  and  GAPDH  (Dapdh)  as  references  in  RT-PCR  and,  if  there  is  no  surrogate,  to  use  our  primers  with  extra  caution.  We 
also  propose  a  standard  operation  procedure  in  which  design  of  primers  for  RT-PCR  starts  from  avoiding  mis-priming  PGs 
and  all  primers  need  be  tested  for  specificity  with  both  cDNA  and  gDNA. 
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Introduction 

Determination  of  mRNA  level  of  an  interested  gene  in 
eukaryotic  cells  often  involves  conversion  of  the  mRNA  to  cDNA 
by  reverse  transcription  (RT),  followed  by  polymerase  chain 
reactions  (PGR).  This  RT-PCR  approach  is  much  more  sensitive 
than  other  methods  such  as  Northern  blot,  because  PCR  amplifies 
the  cDNA  in  an  exponential  manner.  So  often,  the  RT-PCR 
products  need  to  be  compared  between  two,  or  among  more, 
samples  to  determine  whether  some  sample(s)  have  a  different 
mRNA  level  of  the  interested  gene  from  the  others  [1].  In  this 
case,  a  reference  gene  is  needed  for  sample  normalization,  i.e.  for 
assessing  that  an  equal  amount  of  the  RT  products  from  all  the 
samples  is  used  as  template  in  the  PCR.  Because  RT-PCR  is  so 
sensitive  that  it  can  detect  the  mRNA  level  even  in  a  single  cell, 
variation  in  the  expression  level  of  the  reference  gene  needs  to  be 
tightly  controlled,  otherwise  a  bias  may  be  produced.  Ideally, 
expression  of  the  reference  gene  should  be  constant  in  all  situations 
and  should  be  refractory  to  all  the  changes  in  the  experimental 
conditions.  At  least,  its  expression  should  not  be  changed  by  the  to- 
be-studied  situation. 

There  has  been  a  long  list  of  genes  that  have  been  used  as 
references  in  RT-PCR  [2—6],  of  which  the  genes  encoding  (3-actin 
(ACTB  in  human  or  Actb  in  mouse,  according  to  the  NCBI 


nomenclature  of  genes)  and  glyceraldehyde- 3 -phosphate  dehydro¬ 
genase  (GAPDH  in  human  and  Gapdh  in  mouse)  are  the  two  most 
frequently  used  ones  [1].  The  hypoxanthine  phosphoribosyltrans¬ 
ferase-1  (HPRT1  in  human  and  Hprt  in  mouse)  is  also  used  often 
[5,7,8].  The  reason  for  having  so  many  reference  genes  is  because 
none  of  them  really  meets  the  above  mentioned  ideal  criteria,  and 
therefore  researchers  have  to  select  different  ones  according  to 
their  to-be-studied  situations  [9—11].  The  weaknesses  of  most 
reference  genes  have  been  discussed  in  the  literature  and  pertain 
mainly  to  the  stability  or  variation  of  their  expression  in  different 
situations  [12—14],  with  only  very  few  concerning  the  influence  of 
pseudogene  (PG)  in  their  fidelity  as  references  [15-17].  ACTB, 
Actb,  GAPDH  or  Gapdh  had  been  used  as  references  in  RT-PCR 
long  before  the  human  and  mouse  genomes  were  fully  sequenced. 
Although  individual  PGs  of  these  genes  in  the  human  and  mouse 
were  reported  a  long  time  ago,  these  genes  continue  serving  as 
references,  because  some  peers  do  not  realize  that  these  genes  have 
PGs  while  many  others  consider  that  most  PGs  are  not 
transcribed,  which  actually  has  been  a  generally  accepted  concept 
for  a  long  time. 

According  to  a  report  from  the  human  genome  project,  about 
1.1%  of  the  human  genomic  DNA  belongs  to  exons  while  24% 
belongs  to  introns,  together  about  a  quarter  of  the  genome  owned 
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by  the  protein  coding  genes  [18].  However,  the  majority  of  the 
remaining  three  quarters  is  not  junk  and  is  actually  also 
transcribed  at  least  in  some  cell  types  or  at  some  times  [19].  After 
sequencing  the  RNA  transcripts  from  about  1%  of  the  human 
genome,  the  ENCODE  pilot  project  reports  that  93%  of  the  bases 
in  this  1%  of  the  genome  are  transcribed  [20].  The  human 
genome  contains  only  about  20,000  protein  coding  genes  but 
about  19,000  PGs,  although  probably  less  than  20%  of  the  PGs 
are  commonly  transcribed  [21].  Like  most  other  non-coding 
RNAs,  many  PG  transcripts  may  be  functional,  such  as  in 
regulation  of  the  expression  of  their  parental  genes  [21—23].  There 
are  even  further  lessons  that  in  some  situations  processed  PGs  are 
transcriptionally  activated  but  are  mistaken  as  the  turn-on  of  the 
authentic  genes  that  are  actually  inactive  [24—26].  These  facts 
actually  arouse  in  us  a  concern  as  to  whether  genetic  knockout  of 
one  gene  would  change  its  PG  expression  or,  after  some  latent 
period  of  time,  even  trigger  expression  of  some  of  its  PGs  that  are 
otherwise  silent  in  some  tissues  or  cell  types,  since  reports  on  gene- 
knockout  animals  hardly  address  this  aspect.  In  short,  these  latest 
advances  in  RNA  biology  are  revolutionary  to  biomedical  science 
as  they  not  only  challenge  the  definition  of  “gene”  [27]  and  many 
other  fundamental  concepts  in  our  mind  but  also  require  us  to 
reevaluate  some  experimental  methodologies.  As  an  example,  in 
this  report  we  provide  bioinformatic  data  showing  that  ACTB 
(Actb)  and  GAPDH  (Gapdh)  have  many  PGs  in  the  human  and 
mouse  genomes,  which  may  affect  the  fidelity  of  these  genes  as 
references  for  RT-PCR. 

Materials  and  Methods 

In  the  database  of  National  Center  for  Biotechnology  Informa¬ 
tion  (NCBI)  of  the  United  States,  the  mRNA  sequence  of  all  genes 
is  presented  as  DNA  sequence,  i.e.  uracil  (U)  is  replaced  by 
thymine  (T).  According  to  the  nomenclature  of  the  NCBI,  the 
name  of  human  genes  should  be  fully  capitalized  whereas  the 
name  of  mouse  genes  should  be  capitalized  only  the  first  letter.  We 
pulled  out  the  mRNA  sequences  of  ACTB,  Actb,  GAPDH, 
Gapdh,  HPRT1  and  Hprt  from  the  NCBI  database;  the  gene 
identification  (gi)  number  and  mRNA  access  number  were 
provided  in  Fig.  SI,  prior  to  the  corresponding  sequence.  PGs 
were  identified  using  online  software  and  databases  as  indicated. 
An  online  software  (http://biotools.umassmed.edu/bioapps/ 
primer 3 www. cgi)  was  used  for  primer  design.  Insilico  PCR  was 
performed  with  two  different  online  software  packages  (http:// 
insilico.ehu.es/PCR/  and  http:/ / genome.csdb.cn/ cgi-bin/hgPcr). 

The  cell  lines  from  which  data  were  presented  include  GI101A 
human  breast  cancer  cell  line  as  well  as  Panc-1,  Panc-28, 
Coolo357  and  L3.5pL  human  pancreatic  cancer  cell  lines;  all 
these  cell  lines  are  well  documented  in  the  literature.  E6E7st  non- 
transformed  and  E6E7st/ras  transformed  human  pancreatic 
ductal  epithelial  cell  lines  were  provided  by  Dr.  Paul  Campbell 
[28].  M8  mouse  pancreatic  [29]  and  ND5  mouse  breast  [30] 
cancer  cell  lines  were  established  by  us  previously.  All  cell  lines 
were  cultured  with  DMEM  containing  5  %  bovine  serum  and  were 
harvested  when  the  cells  reached  about  70%  confluence.  Isolation 
of  total  RNA  from  the  cells  was  performed  using  Trizol 
(Invitrogen,  Gat.  15596-026;  www.invitrogen.com),  following  the 
manual.  Genomic  DNA  (gDNA)  was  isolated  with  the  traditional 
phenol-chloroform  method.  The  gDNA  samples  were  treated  with 
RNase  A  whereas  the  RNA  samples  were  treated  with  DNase  I, 
both  followed  by  extraction  with  phenol  and  chloroform  to 
remove  the  enzyme.  The  DNA  or  RNA  samples  were  then 
precipitated  and  washed  with  ethanol  at  a  final  concentration  of 
70%. 


An  aliquot  of  RNA  from  each  cell  line  was  reverse-transcribed 
to  cDNA  with  random  hexamers  and  M-MLV  Reverse  Tran¬ 
scriptase  (Promega,  Gat.  Ml 705;  www.promega.com),  following 
the  manual.  Forty  cycles  of  PCR  were  performed  to  ensure  that 
the  reactions  entered  into  the  plateau  of  the  amplification  of  the 
authentic  cDNA  and  that  possible  PGs  were  detectable.  PCR 
products  were  separated  in  1  %  agarose  gel,  visualized  with 
ethidium  bromide  staining,  and  photographed  with  Kodak  Digital 
Campture  DC 290  Camera  under  a  UV  light. 

Results 

Identification  of  PGs  of  the  ACTB,  Actb,  GAPDH,  Gapdh, 
HPRT1  and  Hprt 

The  mRNA  (actually  shown  as  DNA)  sequences  of  the  ACTB, 
Actb,  GAPDH,  Gapdh,  HPRT1  and  Hprt,  with  their  gene 
identity  and  mRNA  access  numbers,  are  shown  in  figure  S 1 .  The 
GAPDH  has  an  mRNA  variant  (NM_00 1256799.1)  that  is 
transcribed  from  an  alternative  initiation  site  and  thus  differs 
from  the  wild  type  mRNA  (NM_002046.4)  at  the  5 '-part  (Fig.  SI). 
Like  many  RNA  transcripts  [19],  Actb  and  Hprt  mRNAs  contain 
only  poly-A  signal  but  lack  a  long  poly-A  tail  (Fig.  SI),  which  is  a 
reason  for  us  to  perform  RT  with  random  hexamers.  We  used 
these  mRNA  sequences,  after  deleting  the  poly-A  tail  from  those 
having  it,  as  a  bait  to  fish  out  their  PGs  from  the  corresponding 
(human  or  mouse)  genome  in  the  UCSC  Genome  Browser 
Database  (http://genome.ucsc.edu/)  by  performing  Blat  search 
[31].  The  UCSG  Genome  Browser  scores  similarity  according  to 
not  only  sequence  identity  but  also  sequence  length,  gap,  etc,  with 
a  higher  score  indicating  a  generally  better  similarity.  The  results 
identified  64,  69,  67,  197,  3  and  1  PGs  for  the  ACTB,  Actb, 
GAPDH,  Gapdh,  HPRT1  and  Hprt,  respectively  (table  1),  which 
score  over  200  and  have  over  80%  identity  to  the  bait.  Those 
genomic  sequences  that  score  less  than  200  are  not  counted  in, 
although  they  still  have  over  83%  identity  to  the  bait  and  span 
several  hundred  nucleotides  (nt)  on  the  corresponding  chromo¬ 
some.  The  details  of  these  PGs,  such  as  their  chromosomal 
locations,  sizes,  starting  and  ending  nt,  homologues,  etc,  are  shown 
in  figures  1  and  2  as  well  as  figures  S2,  S3,  S4  and  S5.  Most  of 
these  PGs  are  processed,  i.e.  intronless,  as  they  are  similar  in  size  to 
the  bait.  Use  of  other  tools  such  as  Blast  of  the  NCBI  database 
(http://blast.ncbi.nlm.nih.gov/Blast.cgi),  search  from  other  sourc¬ 
es  such  as  the  PG  database  (http://www.pseudogene.org/),  or 
imposition  of  different  criteria  for  the  cutoff  may  result  in  different 
numbers  of  PGs.  For  instance,  another  study  identified  only  56 
PGs  of  the  GAPDH  and  166  PGs  of  the  Gapdh  [32].  However, 
the  conclusion  remains  the  same  that  there  are  many  PGs  of  these 
genes  in  the  human  and  mouse  genomes. 


Table  1.  Number  of  putative  pseudogenes. 


Human 

Mouse 

Gene 

ACTB 

GAPDH  HPRT1 

Actb 

Gapdh  Hprt 

Number 

64 

67  3 

69 

197  1 

Note:  Only  those  putative  pseudogenes  that  score  over  200  are  counted,  with 
details  are  presented  in  S-Fig.  2,  3,  4,  and  5. 
doi:1 0.1 371/journal.pone.0041 659.t001 
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Forward  primer  (123-142nd  nt) 


Chr4 

hHPRTl 

Chrll 

Chr5 

Chr4 

hHPRTl 

Chrll 

Chr5 

Chr4 

hHPRTl 

Chrll 

Chr5 

Chr4 

hHPRTl 

Chrll 

Chr5 

Chr4 

hHPRTl 

Chrll 
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Chr4 
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Chrll 

ChrS 

Chr4 

hHPRTl 

Chrll 

Chr5 

Chr4 

hHPRTl 

Chrll 

ChrS 

Chr4 

hHPRTl 


GGCGGGGCCTGCTTCTCCTCAGCTTCAGGCGGCTGCGACGAGCCCTCAGGCGAACCTCTCGGCTTTCCCGCGCGGCGCCGCCTCTTGCTGCGCCTCCGCCTCCTCCTCTGCTCCGCCACCGGCTTCCTCCTCCTGAGCAGTCAGCCCGCGCGCCGGCCGG  160 

- TTTTTAAGTTTT - TATTT - 

- CTAATCATTATTTTGAGGATTTAGAAAAGATATTTATTCCTCATGGATTAATCACAGATGGGA - 

CTCCGTTATGGCGACCCGCAGCCCTGGCGTCGTGATTAGTGATGATGAACCAGGTTATGACCTTGATTTATTTTGCATACCTAATCATTATGCTGAGGATTTGGAAAGGGTGTTTATTCCTCATGGACTAATTATGGACAGGACTGAACGTCTTGCTCGA  320 

- TGATTAGTGATGATGAACCAGGTTATAACCTAGATTTATATCATATACCTCATCATTACACTGAGGATTTGCAAAAGGT-TTTATTCCTCATGGACTGATTGTGAACAGGACCAAACGACTTGCCTGA 

- TAATTTTTATA - TAATT - AGGAAAG - AACAGAAGTGATAAAAT - ACTC-T-TAA - TTAT - G 

- ATAAATTCTTTGCTGATTTGCTGAAT AGCAAAGCACTT-AGCAGAAATAAGGACATATCCATTC-TATGACTGCAGATTTTAT-CAGATCAAACAG 

GATGTGATGAAGGAGATGGGAGGCCATCACATTGTAGCCCTCTGTGTGCTCAAGGGGGGCTATAAATTCTTTGCTGACCTGCTGGATTACATCAAAGCACTG-AATAGAAATAGTGATAGATCCATTCCTATGACTGTAGATTTTAT-CAGACTGAAGAG  478 
GATGTGATGAAGGAGATAGGAGCCTCTCACATTGTAGCCTTCTGTTTGTTCAAGGGGGGTTGTACTTTTTTTGCTGACCTGCTAGATAACATCAAAGCACCTGAATAGAAATAGTCATAGAGCCATTCCTATGA - TAGAGCCAATACAGAGCAAGGAG 

CCACTGCAT - ATTTAGGAA - AGGAAGAAGCAGTC-ATTCTG - TCCT - 

TTACTGTAATGGCCAGTCAA - AAAAGTAATTTAGGAAGATGATTTTACAACTTTAACTGGAAAGAATGTCTTGATTGAAAAAGATACAATTTACCCTGACAAAACAATGCAAACCTGGCCTTCCTTGACCAAGCAGTATAATCCAAAGATG 

CTATTGTAATGACCAGTCAACAGGGGACATAAAAGTAATTGGTGGAGATGATCTCTCAACTTTAACTGGAAAGAATGTCTTGATTGTGGAAGATATAATTGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAAAGATG  638 
CTACTGTAATGACCAGTCAACAGGGGATATAAAGGTAATTGGTGGCGATGATCCTTCAACTTTAACTAGAAAGAATGTCTTGATTTTTGAAGATGTAATTGACACTGGCAAAACAATGCCAACCTTGCC-TTCTGGGTCAAGTACTATAATCTAAAAATG 
Reverse  primer  (664-683rd  nt) 

- GGAATT - ATTT - TAG - TAGTGCTCCAG - TTGTGT - 

ATCAA - TGGATGTGAAATTCCAGACCAATTTGTTGTAGGATATGTCCTTGACTATAATTAATACTTTAAGGATTTGAATCATGTTTGTGTCATTAGTAAAA 

GTCAAGGTCGCAAGCTTGCTGGTGA  GGATATAAGCCAGACTTTGTTGGATTTGAAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATAATGAATACTTCAGGGATTTGAATCATGTTTGTGTCATTAGTGAAA  898 

GTC - AAGGTTT GCAAGCTCGC — TAGTG - AGAAAGCACCCTTCAGA - GTGT - 

--GGA - CTG - TATT - TTGCCA — AAGGTT - CAGTG - TA - TTTCTG-GTGAATCTT-TG - TC - 

TTGGAAAAGCAAAATACAAAG - GAGTTTGGGAACATCTGGAGTCTCATTGACATTGCCAGTAAAATT - CTGTGGTCATCTGCATAGTAGAGCTTTTTGCATGGATCTT-TGGGAATTTTATCTGTT 

CTGGAAAAGCAAAATACAAAGCCTAAGATGAGAGTTCAAGTTGAGTTTGGAAACATCTGGAGTCCTATTGACATCGCCAGTAAAATTATCAATGTTCTAGTTCTGTGGCCATCTGCTTAGTAGAGCTTTTTGCATGTATCTTCTAAGAATTTTATCTGTT  958 

-TAGAA - ATAG - GCCA - GACT - TTGT - TGGA - TTTG - AAATTCC - 

- AGCAGTT - CTTCTTAAA - ATAAATCAATAATTTCC - 

GCATATTTTAGATATAACAGTTGCTGCCATCCTAAACTCATTGTTAGCACTAAGAACCAATAGATCATCAGTTCCCTTTGGATAGGTTGTTGTTCCACTTGT - GAAACATTGAA - 

TTGTACTTTAGAAATGTCAGTTGCTGCATTCCTAAACTGTTTATTTGCACTATGAGCCTATAGACTATCAGTTCCCTTTGGGCGGATTGTTGTTTAACTTGTAAATGAAAAAATTCTCTTAAACCACAGCACTATTGAGTGAAACATTGAACTCATATCT  llll 
- AGACAAA - CTTGTTGCA - GGAT - ATGCCCTT - 

- AGAGAAGATATATT - 

GTAAGAAATAAAGAGAAGATATATTAGTTTTTTAATTGGTATTTTAATTTTTATATATGCAGGAAAGAATAGAAGTGATTGAATATTGTTAATTATACCACCGTGTGTTAGAAAAGTAAGAAGCAGTCAATTTTCACATCAAAGACAGCATCTAAGAAGT  1271 

TTTGTTCTGTCCTGGAATTATTTTAGTAGTGTTTCAGTAATGTTGACTGTATTTTCCAACTTGTTCAAATTATTACCAGTGAATCTTTGTCAGCAGTTCCCTTTTAAATGCAAATCAATAAATTCCC  1405 


Figure  1.  Identification  of  HPRT1  PGs  for  primer  design.  Top  panel:  Blat  search  using  HPRT1  mRNA  sequence  (1405-bp  long  after  deletion  of 
the  poly-A  tail)  as  the  bait  pulls  out  three  putative  PGs,  besides  the  authentic  HPRT1  genomic  sequence  that  spans  40514  nt  on  the  plus  strand  of  X 
chromosome.  The  three  putative  PGs  match  the  1 93-1 383rd,  the  269-1 41 3th,  and  the  21 3-1 1 43rd  nt  regions  of  the  HPRT1  mRNA  as  illustrated.  There 
are  three  additional  very  short  fragments,  spanning  only  28,  35  and  35  bp,  respectively,  that  also  match  parts  of  HPRT1  mRNA  but  are  not  considered 
as  PGs.  Bottom  panel:  We  pulled  out  the  sequence  of  each  PG  (by  clicking  “details”)  and  assembled  those  homologous  parts  to  construct  the 
"cDNA”  of  the  PGs  on  chromosomes  11  and  4.  Alignment  of  the  three  sequences  with  the  HPRT1  mRNA  reveals  that  the  HPRT1  mRNA  has  some 
unique  regions.  The  forward  and  reverse  primers  (table  2)  we  designed  are  underlined. 
doi:1 0.1 371  /journal. pone.0041 659.g001 


HPRT1  or  Hprt  primer  design  starting  from  discrimination 
against  PGs 

The  result  of  Blat  search  shows  that  the  HPRT  1  mRNA,  which 
is  a  1405-bp  sequence  after  its  poly-A  tail  is  deleted,  spans  over 
40514  nt  in  the  plus  strand  of  the  X  chromosome  (Fig.  1,  top 
panel).  The  93723936-93732367th  nt  region  of  the  minus  strand 
of  chromosome  11  has  an  88.3%  identity  to  the  193-1384th  nt 
region  of  the  HPRT1  mRNA.  This  putative  PG  spans  8432  nt  and 
is  thus  unprocessed,  i.e.  containing  intron(s).  Another  putative  PG 
at  chromosome  5  has  an  86.2%  identity  to  the  269-1 404th  nt 
region  of  the  HPRT1  mRNA.  The  third  putative  PG  is  at 
chromosome  4  and  is  homologous  to  the  213—1 143rd  nt  region  of 
the  HPRT1  mRNA;  it  spans  1274  nt,  longer  than  930  bp  (2 1 3 — 
1 143rd),  and  thus  may  be  unprocessed  as  well.  There  are  three 
other  genomic  fragments  that  are  highly  homologous  to  parts  of 
the  HPRT1  mRNA  but  are  too  small  (spanning  only  28,  35  and 
42  nt,  respectively)  to  be  considered  as  PGs  (Fig.  1,  top  panel). 

During  Blat  search,  we  clicked  the  “details”  of  each  PG  shown 
in  figure  1  to  display  the  whole  sequence.  For  the  unprocessed  PGs 
on  chromosomes  1 1  and  4,  we  assembled  together  the  parts  that 
are  homologous  to  the  HPRT1  mRNA  to  construct  the  putative 
“cDNA”.  Alignment  of  the  HPRT1  mRNA  with  the  original 
sequence  or  the  assembled  sequences  of  the  three  PGs  revealed 
that  the  1-1 92nd  nt  and  the  586-7 10th  nt  regions  of  the  HPRT1 
mRNA  are  lacking  in  the  three  PGs  (Fig.  1,  bottom  panel).  We 
designed  a  forward  primer  at  the  123-142nd  nt  and  a  reverse 
primer  at  the  664-683rd  nt  regions  (table  2).  The  HPRT1  mRNA 


also  has  other  unique  regions  that  may  be  used  for  primer  design 
as  well  (Fig.  1,  bottom  panel). 

Use  of  the  Hprt  mRNA  sequence  as  a  bait  to  fish  in  the  mouse 
genome  identified  one  putative  PG  that  locates  at  chromosome  1 7 
and  is  homologous  to  the  240-1 248th  nt  region  of  the  Hprt 
mRNA  (Fig.  2,  top  panel).  We  displayed  the  sequence  of  this  PG 
by  clicking  “details”  during  Blat  search  and  assembled  the  parts 
that  are  homologous  to  the  Hprt  mRNA  to  construct  the 
“cDNA”.  Alignment  of  this  “cDNA”  with  the  Hprt  mRNA  shows 
that  the  l-240th  nt,  the  270-535th  nt,  and  several  other  regions  of 
the  Hprt  mRNA  are  unique  to  the  Hprt  (Fig.  2,  bottom  panel).  We 
designed  a  forward  primer  at  the  56-75th  nt  and  a  reverse  primer 
at  the  48 2-5 03rd  nt  regions  (table  2). 

Design  of  ACTB,  Actb  and  GAPDH  primers  that 
discriminate  against  PGs 

By  a  quick  glance  at  the  figure  S2,  one  could  immediately 
realize  that  most  PGs  of  the  ACTB  are  similar  in  length  to  the 
ACTB  mRNA  and  thus  are  intronless.  The  ACTB  mRNA  lacks  a 
unique  region,  making  it  difficult  to  design  primers  that  would  not 
mis-prime  the  PGs.  We  thus  pulled  out  the  sequences  of  six  best- 
scored  PGs  (in  the  red  box  in  Fig.  S2)  and  aligned  them  with  the 
bait  sequence.  The  results  confirm  that  the  ACTB  mRNA  does 
not  contain  a  unique  part  that  is  long  enough  for  a  primer  (Fig.  3). 
The  best  regions  we  could  find  for  forward  and  reverse  primers 
that  might  have  some  discrimination  against  the  six  PGs  are  the 
1452-1473rd  nt  and  the  1678 — 1 697th  nt  regions  of  the  ACTB 
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Forward  primer  (56-75th  nt) 

mHprt  GGAGCCTGGCCGGCAGCGTTTCTGAGCCATTGCTGAGGCGGCGAGGGAGAGCGTTGGGCTTACCTCACTGCTTTCCGGAGCGGTAGCACCTCCTCCGCCGGCTTCCTCCTCAGACCGCTTTTTGCCGCGAGCCGACCGGTCCCGTCATGCCGACCCGCAG  160 
Chrl7  - 

mHprt  TCCCAGCGTCGTGATTAGCGATGATGAACCAGGTTATGACCTAGATTTGTTTTGTATACCTAATCATTATGCCGAGGATTTGGAAAAAGTGTTTATTCCTCATGGACTGATTATGGACAGGACTGAAAGACTTGCTCGAGATGTCATGAAGGAGATGGGA  320 
Chr  1 7  TTGGAAAAGATGTTTATTCCTCA-AGGCTGATTATGGATAGAA-TGAAAGACTTGCAGAAGATGTCATGAAAGGGATGTGA 

mHprt  GGCCATCACATTGTGGCCCTCTGTGTGCTCAAGGGGGGCTATAAGTTCTTTGCTGACCTGCTGGATTACATTAAAGCACTGAATAGAAATAGTGATAGATCCATTCCTATGACTGTAGATTTTATCAGACTGAAGAGCTACTGTAATGATCAGTCAACGG  480 
Chrl7  GGTTGTCATGTTGTGACCTTTGGTGTGCTCA-GGGGAGCTATGTGTTCTT - 

Reverse  primer  (482-503rd  nt) 

mHprt  G  TGATCTCTCAACTTTAACTGGAAAGAATGTCTTGATTGTTGAAGATATAATTGACACTGGTAAAACAATGCAAACTTTGCTTTCCCTGGTTAAGCAGTACAGCCCCAAAATGGTTAAGGTTGCAAGCTTGCTGGT  640 

Chr  17  . . . - - - GAAAGAATGTCTTGAT - GACATCATAGTTGACACTGGGAAAACAATGTGAGCCTTGCTCTCCTGGGCCAACCAGTTCACCTCCCCAGTGTTCATTGTCCCAAGCCGGCTGGT 

mHprt  GAAAAGGACCTCTCGAAGTGTTGGATACAGGCCAGACTTTGTTGGATTTGAAATTCCAGACAAGTTTGTTGTTGGATATGCCCTTGACTATAATGAGTACTTCAGGGATTTGAATCACGTTTGTGTCATTAGTGAAACTGGAAAAGCCAAATACAAAGCC  800 

Chrl7  GAAAATGACCTCTCAAAGTGTTGGATACAGGCAAGACTTTGTCGGATCTAAACTTCCAGACGAGTTTGTTGTTGGATGTGCCCTTGACTGTAACACGCAGTCCAGGGATTTGAATCACATTTGTGTCAT - 

mHprt  TAAGATGAGCGCAAGTTGAATCTGCAAATACGAGGAGTCCTGTTGATGTTGCCAGTAAAATTAGCAGGTGTTCTAGTCCTGTGGCCATCTGCCTAGTAAAGCTTTTTGCATGAACCTTCTATGAATGTTACTGTTTTATTTTTAGAAATGTCAGTTGCTG  960 
Chrl7  - GTGGCCATCTGCTTCATAAAGCTTTTTGCAAGAA - TGTT - TTTTTAGAAAGGTCAGTTG - 

mHprt  CGTCCCCAGACTTTTGATTTGCACTATGAGCCTATAGGCCAGCC-TACCCTCTGGTAGATTGTCGCTTATCTTGTAAGAAAAACAAATCTCTTAAATTACCACTTTTAAATAATAATACTGAGATTGTATCTGTAAGAAGGATTTAAAGAGAAGCTATAT  1120 
Chr  17  - ACTTTTGATTTGCA-TATAGGC — AGAGG — AGCCGTTCCCTCTTGTAGATTGTGGTTCAACTTATGAATAAAGCAAATCTTTAAAATTATCATTATTAAATAATAATGCTGATATTGTACCCATAAGAAACATTTAAGTTGAAGATGTAT 

mHprt  TAGTTTTTTAATTGGTATTTTAATTTTTATATATTCAGGAGAGAA-AGATGTGATTG-ATATTGTTAATTTAGACGAGTCTGAAGCTCTCGATTTCCTATCAGTAACAGCATCTAAGAGGTTTTGCTCAGTGGAATAAACATGTTTCAGCAGTGTTGGCT  1280 
Chr  17  TAGAGTTTTAATTGGCATTTGAATTTTTATTTATTCAGGAAAGAACAGATCTGATTGGATATTGTTAATT - CAGCATCTAAGAGGTTTTGCTCAGT - 

mHprt  GTATTTTCCCACTTTCAGTAAATCGTTGTCAACAGTTCCTTTTAAATGCAAATAAATAAATTCTAAAAATTC  1349 
Chrl7  - 

Figure  2.  Identification  of  Hprt  PG  for  primer  design.  Top  panel:  Blat  search  using  Hprt  mRNA  sequence  as  the  bait  pulls  out  only  one 
putative  PG,  besides  the  authentic  Hprt  genomic  sequence  that  spans  33583  bp  in  the  plus  strand  of  the  mouse  X  chromosome.  This  PG  matches  the 
240-1 248th  nt  of  the  Hprt  mRNA  and  spans  933  nt  on  the  mouse  chromosome  17.  Bottom  panel:  We  pulled  out  the  PG  sequence  and  assembled 
the  parts  that  are  homologous  to  the  Hprt  mRNA  to  construct  a  cDNA.  Alignment  of  the  assembled  cDNA  with  the  Hprt  mRNA  reveals  that  the  Hprt 
mRNA  has  several  unique  regions.  The  forward  and  reverse  primers  (table  2)  we  designed  in  some  unique  regions  are  underlined. 
doi:1 0.1 371/journal.pone.0041 659.g002 


mRNA,  respectively  (Fig.  3  and  table  2).  However,  since  there  are 
a  total  of  64  putative  PGs  and  only  six  of  them  were  aligned,  it 
remains  possible  that  these  primers  may  match  better  some  of  the 
other  PGs  than  the  six  aligned. 

Similar  to  its  human  counterpart,  the  mouse  Actb  mRNA  does 
not  have  any  unique  sequence  either,  relative  to  its  PGs  (Fig.  S3). 
We  pulled  out  the  sequences  of  six  best-scored  PGs  (in  the  red  box 
in  Fig.  S3)  and  aligned  them  with  the  Actb  mRNA.  The  results 
confirm  that  the  Actb  mRNA  has  no  unique  part  that  is  long 
enough  to  be  a  primer  (Fig.  4).  Nevertheless,  we  selected  the  147 1 — 
1489th  nt  and  the  1852-1 869th  nt  regions  of  the  Actb  mRNA  as 
forward  and  reverse  primers,  respectively  (table  2),  which  might 
better  discriminate  against  the  six  PGs  than  the  other  parts  of  the 
Actb  mRNA,  although  it  remains  possible  that  these  primers  may 
match  better  some  of  the  other  PGs  than  the  six  aligned. 

By  a  quick  glance  at  figure  S4,  one  could  immediately  find  that 
the  first  26  nt  of  the  wild  type  GAPDH  mRNA  is  a  unique  region. 
We  pulled  out  the  sequences  of  seven  best-scored  PGs  (in  the  red 
box  in  Fig.  S4)  and  aligned  them  with  the  GAPDH  mRNA.  The 
results  not  only  confirm  the  uniqueness  of  the  1— 26th  nt  but  also 
show  that  the  685-705th  nt  region  has  the  most  mismatches  to 
most  PGs,  except  the  first  X-linked  PG  that  only  has  one  nt 
mismatched  to  the  GAPDH  (Fig.  5).  We  used  these  two  regions  as 
the  forward  and  reverse  primers,  respectively  (table  2),  in  part 
because  this  pair  of  primer  will  not  amplify  the  variant  2  of 
GAPDH  (NM_00 1256799.1),  transcriptional  feature  of  which  is 
unknown. 

Impossibility  of  designing  Gapdh  specific  primers 

A  quick  glance  at  figure  S5  immediately  leads  us  to  the  fact  that 
there  are  so  many  processed  PGs  of  the  mouse  Gapdh  which  are 


100%  or  almost  100%  identical  to  the  Gapdh  mRNA.  Indeed, 
alignment  of  the  Gapdh  mRNA  with  seven  best-scored  PGs  (in  the 
red  box  in  Fig.  S5)  showed  several-nt  mismatches  only,  making  it 
impossible  to  design  any  primer  that  can  discriminate  against  the 
PGs  (Fig.  6). 

Verification  of  the  ACTB,  GAPDH  and  HPRT1  primers 

PCR  results  showed  that  the  authentic  ACTB  band  of  246  bp 
(table  2)  in  agarose  gel  was  amplified  easily,  as  expected,  in  the 
cDNA  sample  from  a  panel  of  human  cell  lines  (Fig.  7  A).  Although 
both  forward  and  reverse  primers  locate  at  the  same  exon  (exon  6) 
and  thus  should  also  amplify  the  authentic  gene,  the  same  band 
was  detected  only  weakly  (not  stronger  than  nonspecific  bands) 
from  the  gDNA  sample  of  the  same  cell  lines  (Fig.  7A).  Thus,  this 
pair  of  primers  meets  our  purpose  to  discriminate  against  gDNA, 
both  the  authentic  gene  and  the  PGs,  as  discussed  later.  However, 
an  additional  band  that  was  about  100-bp  larger  (~350  bp)  than 
the  246-bp  ACTB  cDNA  and  another  one  of  about  650  bp  were 
also  amplified  from  the  cDNA  (but  not  the  gDNA)  samples, 
suggesting  that  our  primers  also  mis-prime  cDNA  of  two  unknown 
genes,  expression  of  which,  as  expected,  varied  among  different 
cell  lines,  as  manifested  by  different  ratios  to  the  246-bp  ACTB 
band  that  serves  as  the  internal  reference  in  the  same  cell  line 
(Fig.  7 A).  This  pair  of  primers  also  detected  many  nonspecific 
bands  from  gDNA  samples  that  differed  in  size  from  the  246-bp 
band. 

The  authentic  band  of  the  wild  type  GAPDH  (700-bp)  was 
detected  only  in  the  cDNA,  but  not  the  gDNA,  samples,  although 
the  primers  also  detected  cDNA  of  two  unknown  genes  at  smaller 
(~500  and  400  bp,  respectively)  sizes  and  detected  several  gDNA 
fragments  of  different  sizes  (Fig.  7B).  Our  HPRT1  primers 
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Table  2.  Primer  information. 


Primer  Name 

Sequence 

Size 

Location 

Fragment 

Region 

hACTB-F1452 

5 '  -TT  A  AT  AGT  C  ATT  CC  AA  AT  ATG  A-3 ' 

22-mers 

exon  6 

246  bp 

1452-1 473rd  nt 

hACTB-R1697 

5  '-GGGACAAAAAAGGGGG  AAGG-3 ' 

20-mers 

exon  6 

1678-1 697th  nt 

hGAPDH-F6 

5'-GAGCCCGCAGCCTCCCGCTT-3' 

20-mers 

exon  1 

700  bp 

6-25th  nt 

hGAPDH-R705 

5  '-CCCGCGGCCATCACGCCACAG-3 ' 

21-mers 

exon  8 

685-705th  nt 

mActb-F1471 

5  '-GACTTTGTACATTGTTTTG-3 ' 

19-mers 

exon  6 

382  bp 

147 1_1 489th  nt 

mActb-R1870 

5 '  -T  GC  ACTTTT  ATT  GGT  CT  CA-3 ' 

19-mers 

exon  6 

1870-1 852nd  nt 

hHPRT1-F123 

5 '  -CTTCCTCCTCCTG  AGCAGTC-3 ' 

20-mers 

exon  1 

561  bp 

123-142nd  nt 

hHPRT1-R683 

5 '  -  AAC  ACTT  CGT  GGGGT  CCTTT-3 ' 

20-mers 

exon  7 

664-683  rd  nt 

mHprt1-F56 

5 ' -GGGCTTACCTCACTGCTTTC-3 ' 

20-mers 

exon  1 

448  bp 

56-75th  nt 

mHprt1-R503 

5 '  -TCT  CC  ACC  AAT  A  ACTTTT  AT  GT  CC-3 ' 

24-mers 

exon  4 

482-503rd  nt 

Note:  The  "h"  or  "m"  in  front  of  each  primer's  name  indicates  the  human  or  mouse  origin.  "F"  or  "R"  indicates  a  forward  or  reverse  primer.  The  number  after  "F" 
indicates  the  position  of  the  first  nucleotide  (nt)  of  that  primer  in  the  mRNA  sequence,  whereas  the  number  after  "R"  indicates  the  position  of  the  last  nt  of  that  primer 
in  the  mRNA  sequence.  Thus,  the  "R"  number  minuses  the  "F"  number  and  then  pluses  one  is  the  size  of  the  DNA  fragment  amplified  by  PCR. 
doi:1 0.1 371/journal.pone.0041 659.t002 


amplified  only  the  anticipated  band  (at  561 -bp)  from  cDNA 
samples,  although  the  primers  also  amplified  several  bands  of 
different  sizes  from  gDNA  samples  (Fig.  7C). 

We  had  obtained  similar  results  from  cDNA  and  gDNA  samples 
of  many  other  human  cell  lines  and  tissues  by  using  the  same 
ACTB,  GAPDH  and  HPRT1  primers  (data  not  shown).  Based  on 
our  experience,  the  PCR  conditions  for  these  primers  are 
recommended  as  initial  denature  at  95 °C  for  5  min,  followed  by 
each  cycle  of  melting  at  95°G  for  30  sec,  primer-annealing  at  58°C 
for  30  sec,  and  elongating  at  72°C  for  30  sec.  To  maintain  the 
reaction  at  the  linear  portion,  the  number  of  cycles  should  be 
significantly  decreased  to  less  than  30  cycles,  unless  a  very  small 
amount  of  cDNA  template  is  used.  The  reaction  should  be 
terminated  at  72°C  for  10  min. 

Verification  of  the  Actb  and  Hprt  primers 

Our  Hprt  primers  (table  2)  amplified  only  the  anticipated  band 
from  the  cDNA  sample,  without  mis-p riming  gDNA,  from  the  M8 
and  ND5  mouse  cell  lines,  thus  confirming  the  specificity  of  the 
primers  (Fig.  7D  and  7E).  In  contrast,  our  Actb  primers  (table  2) 
produced  not  only  the  authentic  Actb  cDNA  band  of  382-bp  (star 
in  Fig.  7D  and  7E)  but  also  several  other  bands  that  were  less 
abundant  and  differed  between  the  cDNA  samples  from  M8  and 
ND5  cells,  indicating  that  the  primers  also  mis-prime  cDNA  of 
other  unknown  genes  expression  of  which  differs  between  cell 
lines.  gDNA  samples  also  produced  a  band  that  was  similar  in  size 
to  the  Actb  cDNA  but  was  very  fuzzy,  likely  because  it  was  a 
mixture  of  different  bands,  including  some  that  were  actually 
slightly  smaller  than  the  Actb  cDNA  (arrow  in  Fig.  7D  and  7E). 
Likely,  this  or  these  fuzzy  bands  are  derived  also  from  some  PGs, 
besides  the  authentic  Actb  gene  that  should  be  amplified  as  both 
primers  locate  at  the  same  exon  (exon  6).  The  Actb  primers  also 
amplified  other  bands  from  gDNA  samples  that  differed  between 
the  two  cell  lines  as  well  (Fig.  7D  and  7E).  We  had  also  studied 
many  other  mouse  cell  lines  and  tissues  and  obtained  very  similar 
results  (data  not  shown). 

Discussion 

Our  bioinformatic  analyses  show  that  the  ACTB  (Actb)  and 
GAPDH  (Gapdh)  genes  have  64-197  putative  PGs  (table  1)  that 
score  over  200  and  have  over  80%  identity  to  the  corresponding 


parental  mRNA,  based  on  the  UCSC  Genome  Browser  [31]. 
There  are  some  more  genomic  fragments  scored  lower  than  200 
and  thus  not  accounted  in  as  PGs,  but  they  span  over  several 
hundred  nt,  have  over  83%  identity  to  the  authentic  mRNA,  and 
may  still  be  mis-primed.  If,  like  its  1  %  that  has  been  studied  [20] , 
the  human  genome  has  93%  of  its  bases  being  transcribed,  we  may 
have  to  accept  the  new  concept  that  most  of  the  genome  is 
transcribed  at  least  in  some  cell  types  or  at  some  times.  Because 
each  of  these  PGs  resides  at  a  different  chromosomal  site  from  the 
authentic  gene,  if  any  of  them  is  expressed,  it  is  controlled  by 
different  transcription-regulatory  elements  and  thus  is  actually  a 
different  gene.  Therefore,  before  the  ACTB  (Actb)  or  GAPDH 
(Gapdh)  can  be  used  as  a  reference  in  RT-PCR,  it  needs  to  be 
confirmed  that  none  of  their  64-197  putative  PGs  is  transcribed  in 
the  particular  cell  (tissue)  or  situation  of  interest.  To  determine 
whether  many  processed  PGs  are  expressed  or  not,  the  only 
strategy  we  can  think  of  is  to  clone  the  RT-PCR  products  from 
each  interested  cell  line  or  tissue  into  a  vector,  followed  by 
sequencing  a  large  number  of  plasmid  clones  to  ensure  that  none 
of  the  clones  has  a  PG  sequence.  If  some  PGs  are  found  to  be 
expressed,  how  their  transcription  is  regulated  needs  to  be 
determined,  which  needs  to  use  another  gene  as  the  reference, 
so  as  to  determine  whether  they  meet  the  criteria  of  a  reference 
gene,  including  refractoriness  to  the  to-be-studied  situation. 
Without  saying,  it  is  practically  impossible  to  perform  such  a 
tedious  and  cumbersome  sideshow  to  determine  the  expression 
status  of  so  many  PGs  and  to  determine  their  transcriptional 
features,  especially  when  a  study  involves  multiple  cell  lines 
(tissues)  or  multiple  experimental  situations.  It  is  much  simpler  to 
forgo  these  genes  and  elect  someone  else,  such  as  the  HPRT1  or 
Hprt. 

In  eukaryotic  genomes,  2.7-97.7%  of  the  genes  are  intronless 
[33],  such  as  about  50%  of  the  G  protein  coupled  receptor  genes 
in  the  human  [34],  while  many  other  genes  have  processed  PGs. 
RT-PCR  amplification  of  the  RNA  transcripts  from  these  two 
classes  of  genes  requires  a  perfect  DNase  digestion  to  remove  not 
only  genomic  DNA  residuals  but  also  mitochondrial  DNA  from 
the  RNA  sample,  since  some  chromosomal  genes  are  highly 
homologous  to  their  ancestors  in  mitochondria  [35,36].  According 
to  our  experience,  complete  DNase  digestion  that  leaves  no 
traceable  DNA  residual  in  the  RNA  sample  is  actually  not  so  easy, 
because  PCR  is  supersensitive.  PG  may  cause  artifact  without 
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hActin  ACCGCCGAGACCGCGTCCGCCCCGCGAGCACAGAGCCTCG-CCTTTGCCGATCCGCCGCCCGTCCACACCCGCCGCCAGCTCACCATGGATGATGATATCGCCGCGCTCGTCGTCG - ACAACGGCTCCGGCATGTGCAAGGCCGGCTTCGCGGGCGAC  160 

Ch  r  5  ACCGCCGAGACTGCATCCGCCTTGCGAGCACAGAGCCTCG - CCTTCGCTGCTCC ACTGCCAGTCCACACCTGCCACCAGCTCACCATGGATGATGATATCACCGCGCTCGTCATTG - - - ACAACGGCTCCGGCATGTGCAAGGCCAGCTTCACGGGCGAC 

Chr2+  AGCTCACCATGGATGATGATACCGCCGTGCTCGTCATTG - ACAACGGCTCTGGCATGTGCAAGGCCGGCTTTGCGGGCGAC 

Ch  r  2  -  AGCTCACCATGGATGATGATACCGCCGTGCTCGTCATTG - ACAACGGCTCTGGCATGTGCAAGGCCGGCTTTGCGGGCGAC 

Chrl  GTCCACACCCGCCGCCAGCTCACCATGGATGATGATATCGCGGCGCTCGTCATTG - ACAACAGCTCCGGCATGTGCAAGGCTGGCTTCACAGGGGAT 

Chr 18  ACCGCGGAGACCGCATCTGCCCGGCGAGCACAGAGCCTTGTCCCTTGCCACTCCGCCGCCCATCCACACCTGCTGCCAGCTCACCATGGATGATGATACTGTCGCCCTCGTCATCG - ACAATGGCTCTGGCATGTGCAGGGCTGGCTTCGCAGGCGAC 

Chr6  ACCGCGGAGACCGCGTCAGCCCAGCGAGCACAGAACCTTG-TCCTTGCCGCTGCGCCTTGCGTCCGCACCCGCCGCCAGCTCACCATGGATGATGCTATCACCGCGCTCGTCGTCGTCGACAACTGCTCCAGCATGCGCAAGGCT - 

hActin  GATGCCCCCCGGGCCGTCTTCCCCTCCATCGTGGGGCGCCCCAGGCACCAGGGCGTGATGGTGGGCATGGGTCAGAAGGATTCCTATGTGGGCGACGAGG-CCCAGAGCAAGAGAGGCATCCTCACCCTGAAGTACCCCATCGAGCACGGCATCGTCACC  320 

Chr5  AATGCCGCCCGGGCAGTCTTCCCCTCCATCGTTGGGCACCCCAGGCACCAGGGCGTGATGGTGGGCATGGGTCAGAAGGATTCCTATGTGGGCGACGAGG-CCCAGAGCAAGAGAGGCATCCTCACCCTGAAGTACCCCATTGAGCACGGCATTGTCACC 

Chr2+  GATGCCCCCCGGGCTGTCTTCCCTTCCATCGTGGGGCGCCCCAGGCAGCAGGGCATGATGGGGGGCATGCATCAGAAAGAGTCCTATGTGGGCAAGGAGG-CCCAGAGCAAGAGAGGCATCCTGACCCTGAAGTACCCCATGGAACACGGCATCATCACC 

Chr2-  GATGCCCCCCGGGCTGTCTTCCCTTCCATCGTGGGGCGCCCCAGGCAGCAGGGCATGATGGGGGGCATGCATCAGAAAGAGTCCTATGTGGGCAAGGAGG-CCCAGAGCAAGAGAGGCATCCTGACCCTGAAGTACCCCATCAAGCATGGCAACGTCACG 

Chrl  GATGACCCCTGGGCCATCTTCCCTTCCATGGTGGGGCGCCCCAGGCACCAGGGCGTGATGGTGAGCATGGGTCAGAAGGATTCCTATGTGGGTGACGAGG-CCCAGAGCAAGAGAGGCATCCTGACCCTGAAGTACCCCATCGAGCACGGCATCGTCACC 

Chr  18  AGTGCTCTCCAGGCCGTCTTCCTCTCCATCATGGGGCACCCCAGGCACCAGGGTGTGATGGTGGGCATGGGTCAGAAGGACTCCTACGTGGGCGACAAGGGCCCAGAGCAAGAGAGGCATTCTGACCCTGAAGTACCGCATCGAGCATGGCATCGTCACC 

Chr6  - CCCCAGGCCGTCTTCCCCTCCATTGTGGGGCACCCTAGGCACCAGGGAGTGATGGTGGGCATGGGTCAGAAGGACTCCTATGTGGGCAAGGAGG-CCCAGAGCAAGAGAGGCATCCTGACTCTGAAGTACCCCATCAAGCATGGCAACGTCACG 

hActin  AACTGGGACGACATGGAGAAAATCTGGCACCACACCTTCTACAATGAGCTGCGTGTGGCTCCCGAGGAGCACCCCGTGCTGCTGACCGAGGCCCCCCTGAACCCCAAGGCCAACCGCGAGAAGATGACCCAG ATCATGTTTGAGACCTTCAACACCC  480 

Chr5  AACTGGGACGACATGGAGAAGATCTGGCACCACACCTTCTACAATGAGCTGCGCGTGGCTCCCGAGGAGCAACCCATGCTGCTGTTTGAGGCCCCCCTGAACCCCAGGGCCAACTGCAAGAAGATGACCCAG ATCATATTTGAGACCTTCAACACCC 

Chr2+  AACTGGGATGACATGGAGAAGATCTGGCACCACACCTTCTACAACGAGCTGCGTGTGGCTCCCGAGGAGCACCCCATCCTGCTGACCGAGGCCCCCCTGAACCCCAAGGCCAACCGCGAGAAGATGACCCAG ATCATGTTTGAGACCTTCAACACCC 

Chr2-  AACTGGGATGACATGGAGAAGATCTGGCACCACACCTTCTACAACGAGCTGCGTGTGGCCCCTGAGGAGCACCCCATCCTGCTGACCGAGGCCCCCCTGAACCCCAAGGCCAACCGCGAGAAGATGACCCAG ATCATGTTTGAGACCTTCAACACCC 

Chrl  AACTGGGACGACATGGAGAAGATCTGGCACCACATCTTCCACAATGAGCTGCGTGTGGCTCCCAAGCAGCACCCCGTGCTGCTGACCAAGGCCCCC-TGAACCCCAAGGCCAACCGTGAGAAGATGACCCAC ATCATGTTTGAGACCTTTAACACGG 

Chrl8  AACTGGGAGGATATGGAGGAGATTTGGCTCCACACCTTCTACAGTGAGCTGCATGTGGCTCCTGAGGAGCACCCCCTGCTGCTGACCGAGGCCTCTCTGAACCCCAAGGCCAACCACGAGAAGGTAAACCAG ATCATGTTTGAGACCTTCAACACCC 

Chr6  AACTGGGACAACATGGAGAAGATCTGGCACCACACCT ACAACGAGGTGCGTGTGACTGCTGAGGAGCACCCCGTGCTGCTGACTGAGGCCCCCCTGAACCCCAAGCTCAACCATGAGAAGACGACCCAGTTCATCATGTTTGAGACCTTCAACACCC 

hActin  CAGCCATGTACGTTGCTATCCAGGCTGTGCTATCCCTGTACGCCTCTGGCCGTACCACTGGCATCGTGATGGACTCCGGTGACGGGGTCACCCACACTGTGCCCATCTACGAGGGG-TATGCCCTCCCCCATGCCATCCTGCGTCTGGACCTGGCTGGCC  640 

Chr5  CAGCCATGTACTTGGCCATCCAGACTATGCTGTCCCTGTACGCCTCTGGCCGTACCACTGGCATCGTGTTGGACTCCAGTGATGGGGTCACCCACACTGTGCCCATCTACGAGGGG-TATGCCCTCCCCAACGCCATCCTGCTTCTGGACCTGGCTGGCC 

Chr2+  CAGCCATGTACGTGGCCATCCAGGCCGTGCCGTCCCTGTACACCTCTGGCCGTACTACTGGCATCGTGATGGACTCTGGTGACGGGGTCACCCACACTGTGCCCATCTATGAGGGG-AATGCCCTCCCCCATGCCACCCTGCGCCTAGACCTGGCTGGGC 

Chr2-  CAGCCATGTACGTGGCCATCCAGGCCATGCTGTCCCTGTACACCTCTGGCCGTACTACTGGCATCGTGATGGACTCTGGTGACGGGGTCACCCACACTGTGCCCATCTATGATGGG-AATGCCCTCCCCCATGCCACCCTGCGCCTAGACCTGGCTGGGC 

Chrl  CAGCCATGTACGTGGCCATCCAGGCTCTGCTGTCCCTGTACGCCTCTGGCCGTACCACTGGCATCATGATGGACTCTGGTGACGAGGTCACCCACACTGTGCCCTTCTACGGGGGG-TATGCCCTCCCCCACGCCATCCTGTGTCTGGACCTGGCTGGCC 

Chrl8  CAGCCATGTATGTGGCCATCCAGGCCTTGCTGTCCCTGTAAGCCTCTGGCCATACCACTGGCATCGTGATGGACTCTGGTGATGGGCTCACCCACACTGTGCCCATCTACGGGGGGGTATGCCCTCCCCCACACCATCCTGCATCTGGACCTGGCTGGCC 

Chr6  CAGCCATGGATGTGGCCATCCAGGCCGTGCTGTCCCTGTATGCCTCTGGAGGTACCACTGGCATCGTGATGCACCCCGGTGACAGGGTCACCCACACTCTGTCCATCTAGGAGGGG-TACGCCCTCCCC-ACGCCATCCTGCGTCTGGACCTGGCTGGCG 

hActin  GGGACCTGACTGACTACCTCATGAAGATCCTCACCGAGCGCGGCTACAGCTTCACCACCACGGCCGAGCGGGAAATCGTGCGTGACATTAAGGAGAAGCTGTGCTACGTCGCCCTGGACTTCGAGCAAGAGATGGCCACGGCTGCTTCCAGCTCCTCCCT  800 

Chr5  GGGACCTGACT ACCTCATGAAGATCCTCACCGAGCGCCGCTACAGCTTCATCACCACG-CGGAGCGGGAAATCATGCGTAACATCAAGGAGAAGCTGTGCTACCTCGTCCTGGACTTCGAGCAGGAGATGGCTACCGCGGCTTCCAGCTCCTCCCT 

Chr2+  GGGAACTGCCTGACTACCTCATGAAGATCCTCACCGAGCGTGGCTATAGGTTCACCACCATGGCCGAGCGGGAAATCGTGCGTGACATCAAAGAGAAGCTGTGCTATGTTGCCCTGGACTTCGAGCAGGAGATGGCCACGGCGGCCTCCAGCTCCTCCCT 

Chr2-  GGGAACTGACTGACTACCTCATGAAGATCCTCACCGAGCGTGGCTATAGGTTCACCACCATGGCCGAGCGGGAAATCGTGCGTGACATCAAAGAGAAGCTGTGCTATGTTGCCCTGGACTTCGAGCAGGAGATGGCCATGGCGGCCTCCAGCTCCTCCCT 

Chrl  GGGACCTAACTGACTACCTCATGAAGATCCTCACGGAGCGCGGCTACAGCTTCACCACTACGGCTGAGCAGGAAATCGTGCATGACATCAAGGAGAAGCCGTGTTACATCGCCCTGGACTTCGAGCAGGAGATGGCCACGGGGGCCTCCAGCTCCTCCCT 

Chrl8  CAGACCTGACTGACCACCTCATGAAGATCCTCACCGAGCGCGGCTACAGCTTCACCACCACGGCCAAGCGGGAAATCGTGCATGACATCGAGGAGAAGCTGTGCTATGTTGCCCTGGACTTTGAGCAGGAGATGGCCATGGTGGCCTCCAACCCCCTGCT 

Chr6  GGGACCTGACTAACTACCTCAAGAAGACCCTCACCCAGCACAGCTACAGCTTCACCACCACG-CTGAGCAGGAAATCATGTGTGACATCAAGGAGAAGCTGTGCTACGTCGCCCTGGAATTCGAGCAGGAGATGGCCTCGGCGGCCTCCAGCTCCTCCCT 


hActin 

Chr5 

Chr2+ 

Chr2- 

Chrl 

Chrl8 

Chr6 


GGAGAAGAGCTACGAGCTGCCTGACGGCCAGGTCATCACCATTGGCAATGAGCGGTTCCGCTGCCCTGAGGCACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGCATCCACGAAACTACCTTCAACTCCATCATGAAGTGTGACGTGGACATC  960 
GGAGAAGAGCTACCAGCTGCCCAACAGCCAGGTCATCACCATTGGCAACGAGTGGTTCTGCTGCCCCGAGGCGCTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGCATCCACGAAACTACCTTCAACTCCATCATGAAGTGTGACTTGGACATC 
AGAGAAGAGCTACGAGCTGCCCGATGGCCAGGTCATCACCATCGGCAACGAGCGGTTCCGCTGCCCCGAGGCGCTCTTCCAGCCTTGCTTCCTGGGCATGGAATCCTGTGGCATCCATGAAACTACCTTCAACTCCATCATGAAGTCTGATGTGGACATC 
AGAGAAGAGCTACGAGCTGCCCGATGGCCAGGTCATCACCATCGGCAACGAGTGGTTCCGCTGCCCCGAGGCGCTCTTCCAGCCTTGCTTCCTGGGCATGGAATCCTGTGGCATCCATGAAACTACCTTCAACTCCATCATGAAGTCTGATGTGGACATC 

GGAGAAGAGCTACAAGCTGCCTGATAGCCAGGTCATCATCA - AAGCAGTTCTGCTGCTCTGAGGCACTCTTCCAGCCTTCCTTCCTGGGCATGTAATCATGTGGCATCCATGAAACTACCTTCAACTCCATCACGAAGTGTGACGTGGACATC 

GGAGAAGAGCTAGGAGCTGCCAGGTGGCCAAGTCATCTGCA - ATGAGCAGTTCCGCTGCCCTGAGGCACTCTTTCAGCCTTCCTTCCTGGGCATGGAATCCTGTGGCATCCATGAAACTACCTTCAACTCCATCATGAAGTGTGACATGGACATC 

GGAGAAGAGCTATGAGCTGCCAGATGACCAGGTCATCACCATCGACAATGAGCGGTTCCGCTGCCCCGAGGCACTCTTCCAGCCTTCCTTTCTGGGCATGGAATCCTGTGGCATCCATGACACTACCTTCAACTCCATTATGAAGTGTGACGTGGACAAC 
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CGCAAAGACCTGTACGCCAACACAGTGCTGTCTGGCGGCACCACCATGTACCCTGGC - ATTGCCGACAGGATGCAGAAGGAGATCACTGCCCTGGCACCCAGCACAATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGGATC  1120 

CGCAAAGACCTGTACACCAGCACAGTGCTGTCTGGTGGCACCACTATGTACCCTGGC - ATTGCTGACAGGATGCAGAAGGAGATTACCGCCCTGGCGCCCAGCAGGACGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCTGTGTGGATT 

CGCAAAGACCTGTACACCAACACAGTGCTGTCTGGCGGCACCACCATGTACCCTGGC - ATGGCCCACAGAATGCAGAAGGAGATCGCTGCCCTGGCGCCTAGCATGATGAAGATCAGGATCATTGCTCCTCCCAAGCGCAAGTACTCCGTGTGGGTC 

CGCAAAGACCTGTACACCAACACAGTGCTGTCTGGCGGCACCACCATGTACCCTGGC - ATGGCCCACAGAATGCAGAAGGAGATCGCTGCCCTGGCGCCTAGCATGTTGAAGATCAGGATCATTGCTCCTCCCAAGCGCAAGTACTCCGTGTGGGTC 

TGCAAAGACCTGTACTCCAACACAGTGCTGTCTGGCAGCACCACCGTGTACTCTGGCCTGCATAGCGGACAGGATGCAGAAGGAGATCACCGCCCTGGCCCCCAGCACGATGAAGATCAAGATCATTGCTCCTCCCGAGCGCAAGTACTCCGTGTGGATC 

TGCAAAGACCGGTACGTCAACAGAGTGCTGTCTGGCGGCACCACCGTGTACCCTGGC - ATCGCCGACAGGATGCA - GGAGATCACCGGCCTGGCTCCAAGCACGATGAAGATCAAGATCATTGCTCCTTCTGAGCGCAAATACTCCATGTGGATT 

CACAAAGACCTGTACGCCAACACAGTGCTGTCTGGCGGCACCAACATGTACCCTGGC - ATCACAGACAGGATGCAGAAGGAGATCACCACCCTGGCGCCCAGCACGATGAAGATCAAGATCATTGCTCCTCCCCAGTGCAAGCACTCCGTGTGGATT 

GGCGGCTCCATCCTGGCCTCGCTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAG-— TATGACGAGTCCGGCCCCTCCATCGTCCACCGCAAATGCTTCTAGGCGGACTATGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCTAACTTGC  1280 
GGAGGCTCCATCCTGGCCTCGCTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAG-— TATGAGGAGTCTGGCCC-TCCATCGTTCACCGCAAATGCTTCTAGGCGGACTGTGACTTAGTTGTGTTACACCTTTTCTTGACAAAACCTAACTTGT 

GGTGGCTCCATCCTGGCCTCGCTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAG - TATGATGAGTCAGGCCCCTCCATTGTCCACCGCAAATGCTTCTAGGTGGACTCTGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCAAACTTCT 

GGTGGCTCCATCCTGGCCTCGCTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAG - TATGATGAGTCAGGCCCCTCCATTGTCCACCGCAAATGCTTCTAGGTGGACTCTGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCAAACTTCT 

GGCACCTCTATCCTGGCCTCTCTGTCCACCTTCCAGCAGATATGGATCAGCAAGCAGGAGGAGTATGACAAGTCTGGCCCCTCCACTGTCCACTGCGAATGCTTCTAGGCGGACTGTGACTTAGTTACATTACACCCTTTCTTGTCAAAACCTAACTTGA 
GGCAGCTCCATCCTGGCCTCGCTGTCTACCTCCCAGCAGATATGGATCAGCAAGCAGGAG— TATGACGAGTCCAGCCCCTCCATCGTCCACCACAAATGCTTCTAGGCGGACTGTTACTTAGTTGCGTTACACCCTCTCTTGACAAAACCTAACCTGA 
GGCTACTCCATCCTGGCCTCCACGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAG - TAGGACGAGTCCGGCCCCTCCATCGTCCACCACAAATGCTTCTAGGCTGACTGTGACTTAGTTGCATTACACCCTTTCTTGACAAAACCTAACTTGC 


-TTTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTC-ACAATGTG  1440 
-GGTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACAGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTCTACAGTGTG 
-GTTTTTTATTGGCTTGACTCAGGATTTAAAAACCGGAATGGTGAAGGTGACAGCAGTCGGTTGGAGGAAGCTTCCTCCAAAGTTCTACAATGTT 
-GTTTTTTATTGGCTTGACTCAGGATTTAAAAACCGGAATGGTGAAGGTGACAGCAGTCGGTTGGAGGAAGCTTCCTCCAAAGTTCTACAATGTG 


GCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGTTTTTTTTGTTTTGTTTTGGTTTTTT — 

GCAGGAAACAAGATAAGATTGGCATGGCTTTATTTGTTTTTTT-GTTTTGTTTTGTTTTTTT-- 
-CAGAAAACAACATGAGATTGGCATGGCTTTATTTGTTTTCTT-GTTTCATTTT-TTGTTTT-- 
-CAGAAAACAACATGAGATTGGCATGGCTTTATTTGTTTTCTT-GTTTCATTTT-TTGTTTT — 

GCAGAAAACAATATGAGATTCGCATGGCTTTATTTGTTTT - TGTTTT — 

GCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGCTTTTTT - GTTTTTTATGGTTTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTTGGATGGAGCAAGCATCCCCCGAAGTTCTACAATGTG 

ACAGAAAACACGATGAGATTGGCATGGCTTTATTTGTTTTTGT - TTTTGTTTGTT - TGTTTGTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTTGGTTGGAGCGAGCATCCCCCAAAGTTCTGCAATGTG 

Forward  primer  (1452-1473rd  nt) 

GCCGAGGACTTTGATTGCACATTGTTGTTTTT  I - AAT AGT  A !  I’  .'CAAATA GATGCGTTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGGAGGT  1600 

GCCGAGGACTTTGATTGTACATTGTTCTTTTTTTT - AATAGTCATTCCAAATATCATGGGATGCATTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTGTCTAAGGAGAATGGCCCAGTCGTCTACCGAGTCCACACAGGGGAGGT 

GCCAAGGACTTTGATTGTACATTGTTCTTCTTTTC - AATAGTCATTCCAAATATTGTGAGACGCATTGTTTCAGGAAGCCCCTTGCCCTGCTAAAAGCCATCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCTAGTTCACACAGGGGAGGT 

GCCAAGGACTTTGATTGTACATTGTTCTTCTTTTC - AATAGTCATTCCAAATATTGTGAGACGCATTGTTTCAGGAAGCCCCTTGCCCTGCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCTAGTTCACACAGGGGAGGT 

GCCGAGGACATTGATTGTACGTTGTTCTTTTTTTAAAAAATAGTCATTCCAAATATCATGAGATGCATTGTTACAGGAAGTTCCTTGCCCTCCTAAAAGCCGCCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCACATCCACACAGAGGAGGT 
GCCGAG-ACTTTGATTGTACATTGTTCTTTTTTTTTT-AATAATCATTCCAAATATTGTGAGATGCATTGTTACAGGAAGTCCCTTGCCCTCCTAAAAGCCACCCCGCTTCTCTTTAAGGAGAATGGCCCAGTCCCCTCCCGAGTCTACACA — GGAGGT 

GCCGAGAACTTTGATTGTACATTGTTCTTTTTTT - AATGGTCATTCCAAATATCGTGAGATGCATTGTTACAGGAAGTCCCTTGCC-TCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCGAGTCCACACACGGGAGGT 

Reverse  primer  (1678-1697th  nt) 

GATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCC  CCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCT  1760 

GATAGCATTGCTTTCGTGTAAATTATGGAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTCTTATTTT-TTTTATTTTGAATGATGAGCCTTCGTGCCCCCCC - CCCCTTTTTTGTCCCCCAGCTTGAGATGTATGAAGGCTTTTGGTGTCCCT 

GATAGCATTGCTTTTGTGCAAATTACATAATGCAAAATTTTTTGAATCTTCGCCTTAATACTTTTTAATTTTGTTTTATTTTGAATGATCAGCCTTCGTGGCCCCCC — . TCTTTTGTACCCCAACTTGGGGTGTATGAAGGCTTTTGGTCTCCCT 

GATAGCATTGCTTTTGTGCAGATTACATAATGCAAAATTTTTTGAATCTTCGCCTTAATACTTTTTAATTTTGTTTTATTTTGAATGATCAGCCTTCGTGGCCCCCC - TCTTTTGTACCCCAACTTGGGGTGTATGAAGGCTTTTGGTCTCCCT 

GATAGCATTGCTCTCATGTAAATTATGTAATGCCAATTTTTAAAAATCTTCACCTTAACACTTTTTTATTTTGTTTTATTTTGAATGATCAGACTTTGTGGCCCCC - TTTTTTGTCCCTCAACTTGAGATGTATGAAGGCTTTCGGTCTCCCT 

GATAGCATTGCT-TCATGTAAATTATGTAATGCAAAACTTTTTTAATTTTCGCCTTAATACTTTTTAATTTTGCTTCATTTTGAATGATCAGCCTTCGTGGTGCCCC - TTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCT 

AATAGCATTGCTTTTGGGTAAATTATATAATGCAAAAATCTTTTAATCTTTGCCTTAATACTTTTTTATTTT-TTTAATTTTGAATGATCAGCCTTTGTGACTCCCC - TTTTTTGTCACCCAACTTGAGATGTATGAAGGCTTTTCATCTCCCT 

GGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTT  1804 

GGGAGTGGGTGAAGGCAGCCAGGGCTTACCTGTGCACTGACTTGAGACCAGTTGAATAAAAGTGCACAC - 

GAGAGTGGCTGGAGGCAGCCAGGGCTTACCTGTACTCTGACTTGAGGAGAGTTGGATAAAAGTGCACACCTT 
GAGAGTGGCTGGAGGCAGCCAGGGCTTACCTGTACTCTGACTTGAGGAGAGTTGGATAAAAGTGCACACCTT 
GGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCGCACCTT 
AGGAGTGGGTAGAGGCAGCCAGGGTTTACCTGTACACTGACTTGAGACCAGCTGAATAAAAGTGCACACCTT 
CGGAGTGGGTGGAGGCAGCCAGGGCTTAACTGTACACTAA  CTTGAGACCAGTTGAATAAAAGTGCA - 


Figure  3.  Alignment  of  the  ACTB  mRNA  with  six  PGs  that  are  the  most  homologous  to  the  ACTB  mRNA  shows  that  the  ACTB  mRNA 
(after  deletion  of  the  poly-A  tail)  has  no  unique  region  that  is  long  enough  to  be  a  primer.  The  1452-1 473rd  nt  and  the  1678-1 697th  nt 
regions  of  the  ACTB  mRNA  (underlined)  encompass  the  most  mismatches  compared  with  other  regions,  and  thus  are  selected  as  the  forward  and 
reverse  primers,  respectively. 
doi:1 0.1 371  /journal,  pone.0041 659.g003 


being  transcribed,  which  can  be  exemplified  by  the  Gapdh:  any 
DNA  residual  that  survives  the  DNase  treatment  would,  theoret¬ 
ically,  amplify  198  times  of  the  template  (1  parental  gene  plus  197 
PGs),  which  is  then  increased  exponentially  in  PCR,  although 
usually  gDNA  is  more  difficult  to  be  amplified  as  discussed  later. 
This  is  another  reason  for  forgoing  those  genes  with  many 
processed  PGs  as  a  reference,  even  if  the  PGs  are  not  transcribed 
or  specific  primers  can  be  designed,  such  as  the  GAPDH.  We 
usually  assign  forward  and  reverse  primers  to  two  different  exons 
with  one  or  more  introns  in  between,  because  in  this  way  the 
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gDNA  is  either  too  lengthy  to  be  amplified  by  PCR  or  is  amplified 
as  a  fragment  larger  than  the  cDNA.  Some  published  studies 
omitted  DNase  digestion  of  RNA  samples  in  part  because  the 
primers  were  designed  in  this  way,  but  it  may  not  be  ideal  since  the 
primers  cannot  distinguish  the  cDNA  from  intronless  genes  and 
processed  PGs  because  their  sizes  are  identical  or  similar  to  that  of 
the  cDNA. 

With  the  above  explained,  we  propose  a  standard  operation 
procedure  (SOP)  for  discussion  in  which  design  of  primers  for  RT- 
PCR  start  from  exclusion  of  mis-p riming  PGs,  since  so  many  genes 
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Forgoing  Beta-Actin  and  GAPDH  in  RT-PCR 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


CTGTCGAGTC-GCGTCCACCCGCGAGCACAGCTTCTTTGCAGCTCCTTCGTTGCCGGTCCACACCCGCCACCAGTTCGCCATGGATGACGATATCGCTGCGCTGGTCG - TCGACAACGGCTCCGGCATGTGCAAAGCCGGCTTCGCGGGCGACGA 

- GAGTCAGCGTCTACCCGAGAGCACAGCCTTCTTGCAGCTCCTCCGTTGCCGGTCCACACCC - CCAGTTCGCCATGGATGACGATGCCGCTGCGCTCGTTG - TCGACAACGGCTCCGGC - TTCGCAGGCGACAA 

- TCGAGTTCGCGTCCACACGCGAGCACAGCCTCTTTGCAGCTCCTCCGTCGCTGGTCCATACCCACCACCAGTTCTCCATGGATGACAATATAACTGTGCTCCTTGCCATCATCAACAATGGCTCCAGCATGTGCAAAGCCGGCTTCGCAGGTGACAA 

- TCGAGTTCGCGTCCACACGCGAGCACAGCCTCTTTGCAGCTCCTCCGTCGCTGGTCCATACCCACCACCAGTTCTCCATGGATGACAATATAACTGTGCTCCTTGCCATCATCAACAATGGCTCCAGCATGTGCAAAGCCGGCTTCGCAGGTGACAA 


160 


-CATGGATGACAATATATCGCTGCGCTCCTCCTC — TCGACAACTGCTCCTGCATGTGCAAAGGCAGCCTCGCGGCC - GA 

- GCATGTGCAAAGCTGGCTTTGTGAGCAATGA 


mActin 

Chrl6 

ChrXl - 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


TGCTCCCC - GGGCTGTATTCCCCTCCATCGTGGGCCGCCCTAGGCACCAGGGTGTGATGGTGGGAATGGGTCAGAAGGACTCCTATGTGGGTGACGAGGCCCAGAGCAAGAGAGGTATCCTGACCCTGAAGTACCCCAT - TGAA - CATGGCA 

TGCTCCCCTCCCCGGGCCGTCTTCCCCTCCATCGTGGGCTGCCCTAGGCACCAGGGTGTGATGGTGGGTATGGGTCAGAAGGACTCCTGCATGGGCGACGAGGCCCAGAGCAAGAGAGGCATCCTGACCCTGAAGTACCCCAT - TGAC - CATGGCA 

TGCTC - ATCTTCCCCTCCATCGTGGGCCACCCTAGGCACTAGAGAGTGATGGTGGGTGTGGGTCAGATGGACTCCTACATGG-CGACGAGGCCCAGAGCAAGAGAGGCATCCCGACCCTGAAGTACCCCAT - TGAA - CAAGGCG 

TGCTC - ATCTTCCCCTCCATCGTGGGCCACCCTAGGCACTAGAGAGTGATGGTGGGTGTGGGTCAGATGGACTCCTACATGG-CGACGAGGCCCAGAGCAAGAGAGGCATCCCGACCCTGAAGTACCCCAT - TGAA - CAAGGCG 

- TTCCCCTCCATCGTGGGCCTCCCTAGGCACCAGGGTATGATGGTGGGTATGGGTCAGAGGGACTCTTATGTGGGCGACGAGGCCCAAAGCAAGAGAGGCTTCCTGACCCTGAAGTACCCCAT - TGAA - CATGGCA 

TGCTCCCT - GGGCCGTCATCCCCTCCATCGTGGGCTGCCCTGGGCACTAGGATGTGATGGTGGGTATGGGTCAGAAGGACTCCTACGTGGGCGATGAGGCCCAGAGCAAGAGAGGTATCCCGACCCTGAAGTACCCCAT - TGAA - CACTGTA 

TGCTCCCT - GGGTCATCTTTCCCTCCATCTTGGGCTGTCCTAAGCCCCAGGGCATGATGGTGGGTATGGATCAGAAGGACTCCTACATGGACAACGGTGTTCAACGCAAGAGAGGCATCCTGACTCTAAAATACCCCATACATAAAATACACAGCA 


320 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


TTGTTACCAACTGGGACGACATGGAGAAGATCTGGCACCACACCTTCTACAATGAGCTGCGTGTGGCCCCTGAGGAGCACCCTGTGCTGCTCACCGAGGCCCCCCTGAACCCTAAGGCCAACCGTGAAAAGATGACCCAGATCATGTTTGAGACCTTCAA 

TTGTCACCAACTGGAACGACATGGAGAAGATCTGGCACCACACCTTCTACAGTGAGCTGCGTGTGGCC - CTGTGCTGCTCACCAAGGCCCCC - CTAAGGCCAACAGTGAAAAGATGACCCAGATCATGTTTGAGACC - AA 

TTGTCACCACCTGGGACGATATGGAGAAGATCTGGCACCACACTTTCTACAATGAGCTGCGTGGGGCTCCTGAGGAGCACCCTGTGCTGCTCACCGAGGCCCCCCTGAACCCTAAGGCCAACCGTGAAAAGATGACCCTGATCATGTTTGAGACCTTCAA 

TTGTCACCACCTGGGACGATATGGAGAAGATCTGGCACCACACTTTCTACAATGAGCTGCGTGGGGCTCCTGAGGAGCACCCTGTGCTGCTCACCGAGGCCCCCCTGAACCCTAAGGCCAACCGTGAAAAGATGACCCTGATCATGTTTGAGACCTTCAA 

TTGTCACCAACTGGGACAATATAAAGAAGGTCTGGCACCATACCTTTTACAATGAGCTGCGTGTGGCCCCTGAGGAGTACCCTGTACTGCTCACTGAAGGCCCCCTGAACCCTAAGGCCAACCGTGAAAAGATGACCCAGATCATGTTTGAGACCTTCAA 


CTGTCACCAACTGGGACGATATGGAGAAGATCTGGCACCACACCTTCTACAATGAACTGTGTGTGGCCCCTGGGGAGCACCCTGTGCTGCTCACCAAGGCC— CCTGAACCCTTGGACCAACCGTG - AGACCTTTAA 

TTGCCACCAACTGGAAAAATATGGAGAAGATCTGGCACCGCAACTTTTACAA - GCTGCATGTGGCATCTGAGGTGTACCCTGTGCTGCTCACCAAGTCCTCCCTGAACCTTATGGCCAACCATG - AGAACTTTAA 


480 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX+ 


CACCCCAGCCATGTACGTAGCCATCCAGGCTGTGCTGTCCCTGTATGCCTCTGGTCGTACCACAGGCATTGTGATGGACTCCGGAGACGGGGTCACCCACACTGTGCCCATCTACGAGGGCTATGCTCTC — CCTCACGCCATCCTGCGTCTGGACCTGG 
TACCCCAGCCATGTCTGTAGCCATCCAGGCTGTGCTGTCCCTGTATGCCTCTGGTCGTACCACTGGCATTGTGATGGACTCCGGAGATGGGGTCACCCACACTGTGCCCATCTACGAGGGCTATGCTCTCTCCCTCACGCCATCCTGCGTCTGGACCTGG 
CACCCCAGCCTTGTCCGTAGCCATCCAGGCTGTGCTATCCCTGAAAGCCTCTGGTCGTACCACTGGCATTGTGATGGACTCCGGAGACGTGGTCACCCACACTGTGCCCATCTACGAGGGCTATGCTCTCT--CTCACCCCATCCTGCGCCTGGACCAGG 
CACCCCAGCCTTGTCCGTAGCCATCCAGGCTGTGCTATCCCTGAAAGCCTCTGGTCGTACCACTGGCATTGTGATGGACTCCGGAGACGTGGTCACCCACACTGTGCCCATCTACGAGGGCTATGCTCTCT— CTCACCCCATCCTGCGCCTGGACCAGG 

CATCCCAGCCATGTACAACGCCATCCAGGCTGTGTTGTCCCTATATGCATCTGGTTGTACCACTGGTATTTTGATGGACTTCAGAGATG-GGTTACCCACATTATGCCCATCTACGAGGGCTATG - CCATGCTGTGTCTGGACCTGG 

CACCCCAGCCACATACGTAGCCATCCAGGCTGTGTTGTCTCTATATGTCTCTGGTCCTACTACTGGCATTGCAATGGACTCTGGAGACAGGGTCACCCACACTGTGCCTATCTGTGAGGGCTATGCTCTCC — CTCATTCCATCCTGTGTCTGGACCTGG 
CACCCCAGCCACATACGTGGCCATCCAGGCTGTGCTGTCTCTGTATGCCCCTGGTCCTACCACTGGCATTGTGATGGACTCTGGAGACAGGCTCACCCACACTGTGTCCATCTATAAGGGCTATGCTATCC — CTAATGCAGTCTTCCATTTGGACTTGG 


640 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


CTGGCCGGG— ACCTGACAGACTACCTCATGAAGATCCTGACCGAGCGTGGCTACAGCTTCACCACCACAGCTGAGAGGGAAATCGTGCGTGACATCAAAGAGAAGCTGTGCTATGTTGCTCTAGACTTCGAGCAGGAGATGGCCACTGCCGCA— -TCC 

CTGGCCGGG — ACCTGACAGACTACCTCATGAAGATCCTGACTGAGCATGGCTACAGCTTCACCACCACAGCTGAGAGGGAAATCGTGCGTGACATCAAAGAGAAGCTGTGCTATGTTGCCCTAGACTTCGAGCAGGAGATGGCCACGGCCGCA - TCC 

CTGGCTGTGTGACCTGACGGACTACCTCATGAAGATCCTGACCGAGCGTGGCTACAGCTTCATCACCCCAGCTGAGAGGGTAATCGTGCGTGACATTAAAGAGAAGCTGTGCTGTGTTGCCCTAGACTTTGAGCAGGAGATGGCCACTGCCACA - TCC 

CTGGCTGTGTGACCTGACGGACTACCTCATGAAGATCCTGACCGAGCGTGGCTACAGCTTCATCACCCCAGCTGAGAGGGTAATCGTGCGTGACATTAAAGAGAAGCTGTGCTGTGTTGCCCTAGACTTTGAGCAGGAGATGGCCACTGCCACA - TCC 

CTGGCCGGG— ACCTGACAGTGTACCTCATGAAGATCCTGACCCAGCGTGGCTACAGCTTCACCAACATGGCTGAGAGGGAAATCATGCGTGACA - AAGAGAAGCTGTGCTATGTTGCCCTAGACTTTGAGCAGGAGATGGCCACTGCCGCA - TTC 

CTGGTCTGA— ACCTGACAGACTACCTCAAGAAGATCCTGACAGAGCGTGGCTACAGCTTTACCACCATAGCTGAGAGGGAAATTGTGTGTGACATCAAAGAGAAGCCGTGCTATGTTGCCCTAGATTTCAAGCAGGAGATGACCACTGCCGCAAGGTCC 
CTGGCTTGC--ATCTGACAGACTACTTCATGAAGATCCTGACCCAGAGTAGCTACAGCCTCATCACCACAGCTGAAGGAGAAATTGAGTGTGACATCAAAGAGAAGCTGTGCTATGTTGCCCTAGATTTCAAGCGGGAGATGGCCACTGCAGCA— TCC 


800 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


TCTTCCTCCCTGGAGAAGAGCTATGAGCTGCCTGACGGCCAGGTCATCACTATTGGCAACGAGCGGTTCCGATGCCCT-GAGGCTCTTTTCCAGCCTTCCTTCTTGGGTATGGAATCCTGTGGCATCCATGAAACTACATTCAATTCCATCATGAAGTGT 

TCTTCTTCCCTGGAGAAGAGCTATGAGCTGCCTGACGGCCAGGTCATCACTATTGGCAATGAGCGGTTCCGATACCCT-GAGGCTCTTTTCCAGCCTTCCTTCTTGGGTATGGAATCCTGTGGCATCCACGAAACTACATTCAATTCCATCATGAAGTGT 

TCTTCCTCCCTGGAGAAGACCTATGAGCTGCTTGATGGCCAGGTCATCACTATTGGTAATGAGAAGTTCTGCTGCCCT-GAGGCTCTTTTCCAGCCTTTCTTCTTGGGTATGGAATCCTGTGGCATCCATGAAACTACATTCAATTCCACCATGAAGTGT 

TCTTCCTCCCTGGAGAAGACCTATGAGCTGCTTGATGGCCAGGTCATCACTATTGGTAATGAGAAGTTCTGCTGCCCT-GAGGCTCTTTTCCAGCCTTTCTTCTTGGGTATGGAATCCTGTGGCATCCATGAAACTACATTCAATTCCACCATGAAGTGT 

TCTTCTTCCCTGGAGAAGAGCTATAAGCTGCCTGACAGACAGGTCATCACTATTGGCAATAAGCGGTTCCAATGCCCT-GGGGCTCTTTTCCAGCCTTCCTTCTTGAGTATGGAATCCTGTGGCATCCATGAAACTACATTCAATTCCATCATGAAGTAT 

TCTTCCTCCCTCGAGAAGAAGTATGAGTTGCCTGACCACCAGGTCATCACTATTGGCAATGAGCAGTTCTGCTGCCCC-AAGGCTCTTCTCCAGTCTTCCTTCTTGGGTATAGAATCCTGTGGTATCCATGAAACTACATTCCATTCCATCATGAAGTGT 

TCTTCCTCCCTGGAGAAGAGCTATGAGCTGCCTGTTGGCCAGGTCATTACTATTGGCAATGAGCAGTTCTGCTGCCCCTGAGGTTCCTTTCCAGCCTTCCTTCTTGGGCATGGAATACTGTGGTATCCATGAAACTACATTAAGTTCCATCAGGAAGTGT 


960 


mActin 

Chrl6 

ChrXl - 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


GACGTTGACATCCGTAAAGACCTCTATGCCAACACAGTGCTGTCTGGTGGTACCACCAT - GTACC - CAGGCATTGCTGACAGGATGCAGAAGGAGATTACTGCTCTGGCTCCTAGCACCATGAAGATCAAGATCATTG 

GACGTTGACACC - AAGATCTCTATGCCAACACAGTGCTGTCTGGTGATACCACCAT - GTACC - CAGGCATTGCTGACAGGATGCAGAAGGAGATTACTGCTCTGGCTCCTAGCACCATGAAGATCAAGATCATTG 

GACTTTGACATCAGGAAAGACCTCTATGCCAACAGAGTGCTGTCTGGTGCCACCACTATACCCATTGCTGACAGGATGCACCACCATGTACACATTGCTGACAGGATGTAGAAGGAGATTGCTGCTCTGGCTCCTAGCACCATGAAGATCAAGAGCATTG 

GACTTTGACATCAGGAAAGACCTCTATGCCAACAGAGTGCTGTCTGGTGCCACCACTATACCCATTGCTGACAGGATGCACCACCATGTACACATTGCTGACAGGATGTAGAAGGAGATTGCTGCTCTGGCTCCTAGCACCATGAAGATCAAGAGCATTG 

GACATTGACATTTGTAAAGACTTCTATGCCAACACAGTGCTATCTGGTAGCACCACTAT - GCACC - CAAACATTGCTGACAGGATGCAGAAGGAGATCACTGCTCTAGATCCTAGCACCAAGAAGACCAAGATCATTG 

GACGTTGACATCCATAAAGACCTCTATGACAACACAGTGCTGTCTGGTGGCACCACCAT - GTACC - CAGGCATTGCTGACCTGATGCA GGAGATCACTGCTCTGTCTCCTAGCACCATGAAGATCAAGATGATTG 

GACGCTGACATCCGTAAAGACCTTTATGACAACACAGTGCTGTCTGGTGGCACCACCAT - GTACC - CAGGCATTGCTGACCTGATGCA— GGAGATCACTGCTCTGGCTCCTAGCACCATGAAGATCAAGATAATTG 


1120 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


CTCCTCCTGAGCGCAAGTACTCTGTGTGGATCGGTGGCTCCATCCTGGCCTCACTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAGTACGATGAGTCCGGCCCCTCCATCGTGCACCGCAAGTGCTTCTAGGCGGACTGTTACTGAGCTGCGTT 

CTCCTCCTGAGCGCAAGTACTCTGTGTGGATCGATGGCTCCATCCTGGCCTCACTGTCCACCTTCCAGCAGAT-CGGCTCAGCAAGCAGGAGTAGGATGAGTCTGGCCCCTCCATCGTGCACCGCAAATGCTTCTAGGCGGACTGTT - 

CTCCTCTTGAGCACAA - TAAATGGATCAGTGGCTCCATCCTGGCCTCACTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAG - TACAATGAGTCTGGCCCCTCCATCGTGCACTGCAAATGCTTCTAGGCAAACTGTTACTGAGCTGCGTT 

CTCCTCTTGAGCACAA - TAAATGGATCAGTGGCTCCATCCTGGCCTCACTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAG - TACAATGAGTCTGGCCCCTCCATCGTGCACTGCAAATGCTTCTAGGCAAACTGTTACTGAGCTGCGTT 

CTCCTCCTAAGCTCAGGTACTCTGTGTAGATCAGTGGCTCCATCCTGGCCTC - CACC  TTCCAGC AGACGTGGATCAG - GAGTCTGGCCTCTCCATCATGCACCACAAATGTTTCTAGGCAGACTGTTACTGAGCTGTGTT 

CTCCTTCTGAGCGCAAGTACTCTGTGTGGATCGGTGCCTCCATCCTGGCCTCACTGACCACCTTCTAGCAGATGTGGATCAGCAAGCAGGAGTACGATGAGTCCGGTCACTCCATCGTGCACCACAAATGCTTCTAGGTGGACTGTTATTGAGCTTCATT 

CTCTTTCTGAGCACAAGTACTCTGTGTGGATCAGTGGCTCCATCCTGGCCTCACTGACCACCTTCTAGCAGATGTGGTTCAGTAAGCAGGAGTATGATGAATCTGGTCCCTCCATCATGCACTGCAAATGCTTCTAGGTGGACTGTTATTGAGCTGCATT 


1280 


mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 

mActin 

Chrl6 

ChrXl- 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


TTACACCCTTTCTTTGACAAAACCTAACTTGCGCAGAA - AAAAAAAAAATAAGAG - 

TTACACCCTTTCTTTGACAAAACCTAACTTGTGCAGAA - AAAAAAA - GAGAG - ACAACATTGGCATGGCTTTGTTTTTGT - TTTTTAA - TTTT - 

TTACACCCTTTCTTTGACAAAACCTAACTTGTGCAGAAAAATAAAAAAA - TAAGAG - ACAACATTGGCATGTCTTT-TTAATTT - TTTTAAAAAGGGTGTTTTA - TTGGTTTT - 

TTACACCCTTTCTTTGACAAAACCTAACTTGTGCAGAAAAAT  AAAAAAA - TAAGAG - ACAACATTGGCATGTCTTT-TTAATTT - TTTTAAAAAGGGTGTTTTA - TTGGTTTT— 

TTATGCCCTTTCTTTGACATAACCTAACTTGGGCAG - AAAAAAA - TATGAG - ACAACATTGGCATGGCTTTGTTTGTTT - GTTTGTTGTTGCTCTTTT - TAAATTTTTTA 

TTACACCCTTTCTTTGACAAAATCTAACTTTCACTG - AAAAAA - TAAAAACAAAATAAAAAACAAACAAAAAAAAAAAACCGAGATAACATTGGCATGACTTTATTTGTTT - TGTTTTGTT - TTGTTTTT - 

TTCCACCCTTTCTTTGACAAAATCTAACTTTCACAG - AAAAAA - TAAAAA - 


Forward  primer  (1471-1489th 

- TTTTTTAAGTTTTTTTGTTTTGTTTTGGCGCTTTTGACTCAGGATTTAAAAACTGG - AACGGTGAAGGCGACAGCAGTTGGTTGGAGCAAACATCCCCCAAAGTTCTACAAATGTGGCTGAGGACTTTGTACATTGTTTTGT 

- TTTTAAAAGGTTTTTTGTTTTGTTTTGGCGCTTTTGACTCAGGATTTAAAAACTGG - AACGGTGAAGGCGAAGGCAGTCGGTTGAAGCAAACATCCCCCAAAGTTCTACAA-TGTGGCTGAGGACTTTG ATTGTACA— 

— GTTTGTT-TTTTGTTTTCTTTTGTTTTGGCGCTTCTGAGTCAGGATTTAAAAACTGG - AAAGGTGAAG - GTGACAGCAGTCAGT - TTGTACGA-  TGTGGCTGAGGACTTTG - ATTGTACA— 

— GTTTGTT-TTTTGTTTTCTTTTGTTTTGGCGCTTCTGAGTCAGGATTTAAAAACTGG - AAAGGTGAAG - GTGACAGCAGTCAGT - TTGTACAA-TGTGGCTGAGGACTTTG— ATTGTACA— 

AAGTTTTTTGTTTTGTTTTGTTTTGTTTTGGCGCTTTAGACTCAGGATTTTAAAACTGG - CATGGTGAAGC - GGTCGGTTGGAGCAAACATCCCCCAAAGTCCTACAG-TGTGGTTGAGCAGTTTG - ATTGTGCA-- 

- ATTTTTTTTAATTGTTGTTTGGTTTTGGTGCTTTGGACTCGAGATTTAAAAACTGGAACAGTGAAGGCAACAAGCAGTGAAGGTGACAGCAGTCGGTTGGAGCAAACATCCCCCAAAGTTCTACAA-GGTGGCTGAGGATTTGG - ATTGTACATT 

- TTGTTTGGTTTTGGTGCTTTTGACTCGAGATTTAAAAACTG - AACAGTAAAGGAGACAGCAGATGGTTAGAGCAAGCATCCCC-AAAGTT - 


1440 


nt) 

1600 


mActin 

Chrl6 

ChrXl - 

ChrXl+ 

Chrl7 

ChrX2- 

ChrX2+ 


•TTTTTTTTTTTTTTGGTTTTGTCTTTTT - TTAATAGTCATTCCAAGTATCCATGAAATAAGTGGTTACAGGAAGTCCCTCACCCTCCCAAAAGCCACCCCCACTCCTAAGAGGAGGATGGTCGCG 

- TTGTTTTTTTGATTTTGTCTTTTT - TTAATAGTCATCCCAAGTATCCATGAAATAAGTGGTTACAGGAAGTCCCTCACCCTCCCAAAAGCCACCCCCACTCCTAAGAAGAGGATGGCCGAG 

- TTATTTTTTTGGTTTTGTTTT - TTAATAGTCATTACAAGTATCCATGAAATAAGTGGTTAAAGGAAGACCCTTACCCTCC-AAAAGCCACTCCCACGCCTAAGAGAAGGATGGACAAG 

- TTATTTTTTTGGTTTTGTTTT - TTAATAGTCATTACAAGTATCCATGAAATAAGTGGTTAAAGGAAGACCCTTACCCTCC-AAAAGCCACTCCCACGCCTAAGAGAAGGATGGACAAG 


GTTTTTTTTTTTTTTTTTTTGGTTTTGTTTT 
- TXT 


-TTAATAGTCATTCCAAGTATCCATTAAATACATAGTTACAAGAAGTCCCTCACCCTCCCAAAAGCCACCCCCACTCCTAAGAGGAGGATAGCCAAG 

■TTAATAGTCACTCCACATATTTATTAAAGATA-GACTATAGGAAGTTCCTCACCCTCCCAAAAGCTACTCCCACTCTTGAGAGGAGGATGGCTGAG 


1760 


mActin  TCCATGCCCTGAGTCCACCCCGGGGAAGGTGACAGCATTGCTTCTGTGTAAATTATGTACTGCAAAAATTTTTTTAAATCTTCCGCCTTAATACTTC ATTTTT-GTTTTTAATTTCTGAATGGCCCAGGTC-TGAGGCCTCCCTTTTTTTT— -GTC  1920 

Ch  r 1 6  TCCACGCCCTGAGTCCACCCCGGGGAAGGTGACAGCATTGCTTCTGTGTAAATTATGTACTGCAAAAATTTTTT- AAATCTTCTGCCTTAATACTTC— ATTTTT-GTTTTTAATTTCTGAATGGTC-AGCTAGTGTGGCCCCCCTTTTT- -T - GTC 

ChrXl-  TCCACACCCTGAGTCCACACCGGGGAAGGTGACATCAGTGGGTCTGTGTAAATTATGTACTGCAAAATTTTTTTTTAATCTTCTGCCTTAATGCTTC - ATTTTT-GTATTTAATTTCTGACTG-TC-AGCCATCGTGGCTCCCCCTTTT — T - GGC 

ChrXl+  TCCACACCCTGAGTCCACACCGGGGAAGGTGACATCAGTGGGTCTGTGTAAATTATGTACTGCAAAATTTTTTTTTAATCTTCTGCCTTAATGCTTC - ATTTTT-GTATTTAATTTCTGACTG-TC-AGCCATCGTGGCTCCCCCTTTT — T - GGC 

Chrl7  TCCACGCCCTGAGTCCACACTGGGCAAGGTGACAGCATTGCTTCTGTGTAAATTATGTACTGCAAAAGTTTTTTTAAATCTTCCACAGTAATACTTCCTCATTTTT-GTTTTTAATTTCTGAATGGTC-AGCCATCATGGCCCCCCTTTTT— T— -G-T 
ChrX2-  TCCATACCCTGAGTCCATACCAGGAAATGTGGCAGCATAGCTTCTGTATAAATTATATACCGCAAATTCTTTTTTAAAGCTTCCACCTTAATACTTCTTCACTTTT-ATTTTAAATTTCTGAATGATC-AGCCATTGTGTCTCTTCTTTTTTGTCCCCCC 

ChrX2+  TCCA - AGGTGACAGCATTACTTCTGTGTAATTTATGTACTG-AAATATTTTTAAAAACCTTCTTCCTTAATATTTCTTCATTTTTTGTTTTTAGTTTTTGAATGGTA-AGTCATTGTGA - TTCTTTTGAATTTGGCT 

Reverse  primer  (1852-1870th  nt) 

mActin  CCCCCAACTTGATGTATGAAGGCTTTGGTCTCCCTGGGAGGGGGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGTACACTGACTTGAGACCA- - - - -ATAAAAGTGCACACCTTACCTTACACAAAC -  1889 

Chrl6  CCCCCAACTTGATGTATGAAGGCTTTGGTCTCTCTGGGAGTGAGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGTACACTGACTTGAGACCATTTTAATAAAAGTGCACACCTTACC - 

ChrXl-  CCCCCAACTTGATGTATGAAGGCTTTGGTCTCCCTAGGAGTGGGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGTACACTGACTTAAGACCATTTTAATAGAAGTGCACACCTTAC - 

ChrXl+  CCCCCAACTTGATGTATGAAGGCTTTGGTCTCCCTAGGAGTGGGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGTACACTGACTTAAGACCATTTTAATAGAAGTGCACACCTTAC - AAACACACAAAC 

Chrl7  CCCCCAGCTTGATGTATAAAGGCTTTGGTTTCTCCAGAAGCAGGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGGACACTGACTTAAGACCATTTTAATAAATGCACACACCTTAC - AAACAAAC - 

ChrX2-  CCCCCAACTTGATATATGAAGGCTTTGGTCACCCTGGGAATGGGTTGAGGTGTTGAGGCAGCCCGGGTTGGCCTGTACACTGACTTGAGACGATTTTAATAAAAGTGCACACC - 

ChrX2+  CCACCAATTTGTTGTATGAAAGCTTTAATCTTCCTGGGAGTGGCTTGAGGTGTTGAGGCA - GCCTGTACAGTGACTTGAGATCATTTTAATAAAAGTGGACACC - 


Figure  4.  Alignment  of  the  Actb  mRNA  with  the  six  PGs  that  are  the  most  homologous  to  the  Actb  mRNA  shows  that  the  Actb  mRNA 
has  no  unique  region  that  is  long  enough  to  be  a  primer.  The  1471 -1489th  nt  and  1852-1 870th  nt  regions  of  the  Actb  mRNA  (underlined) 
have  the  most  mismatches  compared  with  other  regions  and  thus  are  used  as  the  forward  and  reverse  primers,  respectively. 
doi:1 0.1 371  /journal,  pone.0041 659.g004 


have  them  [21].  The  simplest  way  is  to  use  the  to-be-studied 
mRNA  sequence,  after  removing  its  poly-A  tail  if  there  is  one,  as  a 
bait  to  fish  out  PGs  from  the  corresponding  genome.  Primers 
should  be  assigned  to  the  regions  unique  to  the  mRNA,  such  as  the 
first  26  nt  of  the  wild  type  GAPDH  (NM_002046.4).  If  there  is  no 
unique  region,  a  second  step  should  be  taken  to  align  the  mRNA 
with  the  PG  sequence,  so  as  to  identify  regions  that  encompass  the 
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most  mismatches,  such  as  the  region  used  as  our  GAPDH  reverse 
primer.  Once  such  a  region  is  selected,  the  whole  mRNA 
sequence,  but  not  just  the  selected  region,  should  be  used  to  run 
the  routine  primer  designing  software  to  further  analyze  whether  a 
primer  can  be  identified  within  the  selected  region.  It  merits 
mention  that  one  type  of  quantitative  PCR  technique  uses  a  gene- 
specific  oligo  (with  modified  backbone)  probe,  besides  the  two 
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Forgoing  Beta-Actin  and  GAPDH  in  RT-PCR 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

Chrl5 


Forward  primer  (6-25th  nt) 

AAATTGAGCCCGCAGCCTCCCGCTTCGCTCTCTGCTCCTCCTGTTCGACAGTCAGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTCAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGGGCGCCTGGT-CACC  160 

- GCTCTCTGCTCCTCCTGTTCTACAGTCAGCCGCATCTTCTTTTGCATCGTCAGCCAAACCACATCCCTGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACAGATTTGGTCGTACTGGGCGCCTGGT-CACC 

- GCTCTCTGCTCCTCTTGTTCGACAGACAACCACATCTTCTCTTGCATTGCCAGCCG - CATCCATGAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTAGGCGCCTGGT-CAGC 

- AGCCA - CATCCCTGAGACACCATGGTGAAGGTGAAGGTCAGGGTCAGTGGATTTGGTCGTATTGGGCGCCTGGT-CAGC 

- TCTCTGCTCCTCCCGTTCGACAGACAGCCACATCTTCTCATGCATCACCAGCCG - CATCCCTGAGACACCACGGTGAAGGTGAAGGCTGGAGTCAATGGATTTGGTCGTATTGAACACCTGGT-AACC 

- GCTCCCTGCTTCTCCTGTTCGACAGACAGCCACATTTTCTCTTGCATTGCCAGCCG - CATCCCTGAGACATCATGGTGAAGGTGAAGGCCAGAGTCGACAGATTTGGTCATGTTAGGTGTCTGGT-CACC 

- GCTCTCTGCTCCTCCTGTTTGACAGACAGCTGCATCTTCTCTTGCATTGCCAACCA - CATCCCTGAGACACCATGGGGAAAGTGAAGGTTGGAGTCAATGGATTTGGTTGTATTGGGCGCCTGGT-CATC 

- TCTCTGCTCCTCCTGTTCGACAGACAGCCACATATTCTCTCGCATTGGCAGCTG - CATCCCTGAGACACCATGGGTAAGGTGAAGGTCAGAGTCAACAGATTTGGTCGTACTGGGTGCCTGGTGCACC 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

ChrlS 


AGGGCTGCTTTTAACTCTGGTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTTTACATGTTCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAA-TGGAAAT  320 

AGGGCTGCTTTTAACTCTGGTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTGAACTACATGGTTTACATGTTCCAATATGACTCCACCCATAGCAAATTCCATGGCACCGTCAAGGCTGAGAATGGGAAGCTTGTCATCAA-TGGAAAT 

AGGGTTGCGTTTAACTCTGTTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTCTACATGTTCCAGTATGATTCCACCCATGGCAAATTCCATGGCACCATCAAGGCTGAGAACGGGAAATTTGTCATCAAATGGAAAT 

AGGGCTA-TTTTAACTCTGGTAAAGTGGATATTATCACCATCAGTGACCCCTTCATTGATCTCAAGTACATGGTCTACATGTTCCTGTATGATTCCACCCATGGCAAATTCCATGGCACCATCAAGGCTGAGAACGGGAAACTTGTCATCAA-TGGAAAT 

AAGCCTGCTTTTAACTCTGGTAAAGTGGATATTGTCACCATCAATGACCCCTTCGTTGACCTCAACAACATGGTCTACATGTTCCAGTATGATTCCACCCATGGCAAATTCTGTGGCACCGTCAAGGCTGAGAACAGGAAGCTTGTCGCCAG-TGGAAAT 

AGGGCTGCTTTTAACTCTGCTAAAGTAGATATTGTGGCCATCAGTGACCCCTTCATTGACCTCAACTACATGGTCTACATGTTCCAGTATGGTTCTACCCATGGCAAATTCCATGGCACTGTCAAGGCTGAGAACAGGAAGCTTGTCATCAC-TGGAAAT 

AGGGCTGCTTGTAATTCTGGTAAAGTAGATATTATCGCCATCAATGACCCATTCATTGACCTCAGCTACATGGTCTACATGTTCCAGTATGATCCCACCCATGGTAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAA-TGGAAAT 

ATGACTGCTTTTAACTCTGGTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTCTACATGTTCCTGTATGATTCTACACATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGACGCTTGTCATCAA-TGGAAAC 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

Chrl5 


CCCATCACC-ATCTTCCAGGAGCGAGATCCCTCCAAAATCAAGTGGGGCGATGCTGGCGCTGAGTACGTCGTGGAGTCCACTGGCGTCTTCACCACCATGGAGAAGGCTGGGGCTCATTTGCAGGGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT  480 

CCCATCACC-ATCTTCCAGGAGCGAGATCCCTCCAAAATCAAGTGGGGCGATGCTGGCGCTGAGTACGTCATGGAGTCCACTGGCGTCTTCACCACCATGTAGAAGGCTGGGCCTCATTTGCAGGGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT 

CCCATCACC-ATCTTTCAGGAGCAAGATCCCTCTAAAATCAAGTGGGGCGATGCTGGTGCTGAGTACGTCGTGGAGTCCACTGGCGTCTTCACCACCATGTAGAAGGCTGGGGCTCATTTGCAGGGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT 

CCCATCACC-ATCTTCCAGGAGCGAGATCCCTCCAAAATCAAATGGGGCACTGCTGGTGCTGAGTACATCGTGGAGCCCACCGGCACCTTTACCACCATGGAGAAGACTGGGGCTCACTTGCAGGGAGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT 

CCCATCACC-GTCTTCCAGG-GCGAGATCCCTCCAAAATCAAATGGGGTGATGCTGGTGCTCAGTACATTGCGGAGTCCACTTGTATCTTCACCACCATGGAGAAGGCTGGGGCTGACTTGCAGGGAGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT 

CTCATCACC-ATCTTCCAGGAGCGAGATCCCTCCAAAATCAAATGGGGTGATGCTGGCACTGAGTACAGTGTGGAGTCCACCAGCATCTTCACCATCATGGAGAAGTCTGGGGCTCACTTGCAGGGAGGAGCCAAAAGGGTCATCCTTTCTGCCCCCTCT 

CCCATCACT-ATCTTCCAGGAGCCAGATCCCTCCAAAATCAAGTGGGGTGATGCTGGCACTGAGTATGTCATGGAGTCCACTGGCATCTTCACCACCATGGAGAATGCTGGGGCTCACTTGCAGTGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCT 

CCCATCACCCATTTTCCAGGACCAAGATCCCTCCAAAATCAAATGGGGTGATGCTGGCGCTGAGTACGTCGTAGAGTTCACTGGTGTCTTCACCACCATGGAGAAGGCTGGGGCTCACTTGCAGGGGGGAGCCAAAAGGGTCAACATCTCTGTCCTCTCT 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

ChrS 

Chrl3 

Chr6-2 

ChrlS 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

ChrS 

Chrl3 

Chr6-2 

Chrl5 


GCTGATGCCCCCATGTTCGTCATGGGTGTGAACCATGAGAAGTATGACAACAGCCTCAAGATCATCAGCAATGCCTCCTGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGTATCGTGGAAGGACTCATGACCACAGTCC  640 
GCTGATGCCCCAATGTTTGTCATGGGTGTGAACCATGAGAAGTATGACAACAACCTCAAGATTGTCGGCAGTGCCTTCTGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGTATCGTGGAAGGACTCATGACCATAGTCC 
GCTGATGCCCCCATGTTTGTGATGGGCATGAACCATGAGAAGTATGACAACAACCTCAAGATCGTCAGCAATGCCTCCTGCACCACCAACTACTTAGCGCCCCTGGCCAAGGTCATCCATGACAACTTTGGTATCGTGGAAGGACTCATGACCACAGTCC 
GCTGATGCCTCCGTGTTCATAATGGGTGTGAACAATGAGAAGTATGACAACAGCCTCAAGATCATCAGCAATGCCTCCTGTACCACCAACTGCTTAGCGCCCCGGGCCAAGGTCCTCCATGACAACTTTGGTATCGTGAAAGGACTCATGACCACAGTCC 

GCTGATGCCCCCATGTTCGTGATGGATGTGAACCACGAGAAGTATGACAACAGCCTCAAGATCGTCAGCAATGCCTCCTGCATCACCAACTGCTTAGCGCCCCTGGCCAACGTCATCCA - CAACTTTGGTATCGTGGAAGGACCCATGACCACAGTCC 

GCCAACACCCTGATGTTCGTGATGGGCGTGACCCATGAGAATTATGACAGCAGCCTCAAGATCATCAGCAATGCCTCCTGCACCATGAATTGCTTAGCACCCCTGGCCAAAGTCATCCGTGACAACTTTGGTATCATGGAAGGACTCATGACCACAGTCC 

GCTGATCCCCCCATGTTCACGATGGGTGTGAACCATGAGAAGTATGACAACAGCCTCAAGATCATCAGCAATGCCACCTGAACCACCAACTGCCTAGCGCCCCTGGCCAAAGTCATCACTGACAACTTTGGTATCGTGGAAGGACTCATGACCACAGTCC 

GTTGATGCCCCCATGTTTGTGATGGGCGTGAACCATGAGAAGTATGACAACAGCCTCAAGATCGTCAGCAATGCCTCCTGCATCACCAACTGCTTAGCGCCCC-AGCCAAGGTCATCCATGACAACTTTGGTATCATAGAAGGATTCATGACCACAGTTC 

Reverse  primer  (685-705th  nt) 

ATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCCGGGAAACTGTGGCGTGATGGCCGCGGGGCTCTCCAGAACATCATCCCTGCCTCTACTGGCGCTGCCAAGGCTGTGGGCAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGC  800 
ACGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCCGGGAAACTGTGGCGTGATGGCCaCGGGGCTCTCCAGAACATCATCCCTGCCTCTACTGGCACTGCGAAGGCTATGGGCAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGC 
ACGCCATCACTGCCACCCAGAAGACTGCGGATGGCCCCTCTGGGAAACTGTGGCaTGAcGGCCaCGGGGCTCTCCAGAACATCATTCCTGCCTCTACTGACGCTGCCAAGGCTGTGGGAAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGC 
ATGCCATCACCGCCACCCAGAAGACTGTGGATGGCCCCTCCAGGAAACTGTGGCaTGAcGGtgGtGGGACTCTCCAGAACATCATCCCTGCCTCTACTGGCACTGCCAAGGCTGTGGGCAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGC 
ATGCTATCACTGCCACCCAGAAGATTGTGGATGGCCCCTCCGGGAAACT — GGCaTaAcGGCCGtGGaGCTCTCCAGAACATCATCCCTGCCTCTACTGGCACTGCCAAAGCTGTGGGCAAGGTCATCCCTGAGCTGAACAGGAAGCTTACTGGCATGGG 
ATGCTGTCCCTGCCACCCAGAAGACTGTGGATGGACCATCCAGGAGACTGTGGtGTGATaGCCatGGGGCTCTCCAGAACATCATCCCTGCCCCTAGTGACGCTGCCAAGGCTGTGGCCAAGGTCATCCCTGAGCTGAATGGGAAGCTCACTGGCATGGC 
ACGCCATCACTGCCACCCAGAAGACTGTGAATGGCCCCTCCAGGAAACTGTGGCGTGAcaGCCaCaGGGCTCTCCAGAACATCATCCCTGCCACTACTGG-GCTGCCAAGGCTGTGAGGAAGGTCATCCCTGAGCTGAATGGTAAGCTCACTGGCATGGC 
ACACCATCACTGCCACCCA - GACTATAAATGGCCCCTCCGGGAAACTGTcaCGTGATGGCtGCaGGGCTCTCCAGGACATCATCCCTGTCTCTACTGGCGCTGCCAAGCCTGTGGGTAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGC 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

Chrl5 


CTTCCGTGTCCCCACTGCCAACGTGTCAGTGGTGGACCTGACCTGCCGTCTAGAAAAACCTGCCAAATATGATGACATCAAGAAGGTGGTGAAGCAGGCGTCGGAGGGCCCCCTCAAGGGCATCCTGGGCTACA - CTGAGCACCAGGTGGTCTCCTCT  960 

CTTCTGTGTCCCCACTGC - CGTGTCAGTGGTGGACCTGACCTGCCGTCTGGAAAAACCTGCCAAATATGATGACACCAAGAAGGTGGTGAAGCAGGCATCGGAGGGCCCCCTCAAGGGCATCCTGGGCTACA - CTGAGCACCAGGTGGTCTCCTCC 

CTTCCATGCCCCCACTGCCAATGTGTCAGTGGTGGACCTGACCTGCTGTCTGGAAAACCCTGCCAAATATGACAACATCAAGAAGGTGGTGAAGCAGGCGTTGAAGGCCCCCCTCAAGGGCATCCTGGGCTAAA - CTGAGCACCAGGTGGTCTCCTCC 

CTTCTGTGTCCCCACTGCCAATGTGTCAGTCGTGGACCTGATCTACCATCTGGAAAAACCTACCAAATATCATGGCATCAAGAAGGTGGTGAAGGAGGCATCAGAGGGCTCCCTCTAGGGCATCCTGGGCTACAACACCGAGCACCAGGTTGTCTCCTTC 

ATTCTGTGTCCTCACTGCCAATGTGTTGGTCATGGACCTGACATGCCATCTGGAAAAACCTGCCAAATACAATGACATCAAGAAGGTAGTAAAGCAGGCATCCAACG-CCCCCTCAAGGGCATCCTGGGCTACA - CGAGCACCAGGTGGTCTCCTCT 

CTTCTGTGTCCCCACTGC CAACTTGTCAGTTGTGGATCTGACCTGCTGTCTTGAAAAACCTGCCAAATATGATGACATCAAGAAGGTGGTAAAGCAGGTGTCAGAGGGCCCCCTCAAGGGCATCCTGGGCTACA - CTGAGTACCAGCTTGTCTCCTCT 

CCTCCGTGTCCCCACTGCCAACGTGTCAGTGGTGGACCTCACCTACCATCTGGAAAAACCTGCCAAATATGATGACATCAAGAAGGTGGTGAAGCAGGCATCAGAGGGCCCCCTCAAAGGCATCCTGGGCTACA - CTGGGCACCAGGTGGTCTCCTCC 

CTTCCATGTCCCCACTGCCAATGTGTCAGTGGCGGACCTGACCTGTCGTCTGGAAAAACCTGCCAA-TATGATGACATCAAGAAGGTGGTGAAGTAGGCATCGGAGGGCCCCCTCAAGAGCATCCTGGGCTACAA - TGAGCACCAGATGGTCTCCTTC 


hGAPDH  GACTTCAACAGCGACACCCAC - TCCTCCACCTTTGACGCTGGGGCTGGCATTGCCCTCAACGACCACTTTGTCAAGCTCATTTCCTGGTATGACAACGAATTTGGCTACAGCAACAGGGTGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGT  1120 

ChrX  GACTTCAACAGCAACACCCAC - TCTTCCACCTTCAATGCTGGGGCTGTCATTGCCCTCAACAACCACTTTTTCAAGCTCATTTCCTGGTATGACAATGAATTTGGCTACAGCAACAGGATGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGT 

ChrX2  GATTTCAACAGTGACACCCCC - TCCTCCACCTTCAATGCTGGGGCTGGCATTGCCCTCAACGACCACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATTTGGCTACAGCAACAGGGTGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGT 

Chr6-1  GACTTCAACAGCGACATCCAT - TCTTCCACCTTTGATGTTGGGGCTGGCATTGCCCTCAACAACCACTGTGTCAAGCTCATTTCCTGGTATGAGAATGAATTTGGCTACAGCAACAGGGTGGTGGACGTCATGGCCCACAAGTCCTCCAAGGAGT 

Chr5  GACTTCAACAGCGACACCCACACCTACTCTTCCACCTTCAATGCTGGGGCTGGCACTGCCCTCGATGGCCACTTTGTCAAGCTCGTTTCCTGGTATGACAATGAATTTGGCTACAGCAACAGGGTGGTGAACCTCAGGGCCCACATGGCCGCCAAGGAGT 

Chrl3  GACTTCAACAGCGACACCCAC - TCTTCCACCTTCGACGCTGGGGCTAGCATTGCCCTCAACGACCACTTTTTCAAGCTCATTTCCTGGTATGACAATGAATTTGGCTACAGCAACAGGGTGATGGACCTCATGGCCCACATGTCCTCCAAGGAGT 

Chr6-2  GACTTCAACAGTGACACCCAC - TCTTCCACCTTCAATGCTGGGGCTGGCACTGCCCTCAACGACCACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATTTGGCTACAACAACAGGGTGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGT 

ChrlS  GACTTCAACAGCAACACCCAC - TCTTCTACCTTCGATGCTGGGGCAGCCATTGTCCTCAAGGACCACTCTGTCAAGCTAATTTCCTGGTATGACAATGAATTTGGCTACAGCAACAGGGTGGTGCACCTCATGGCCCACAATGCCTCCAAGGAGT 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

ChrlS 


AAGACCCCTGGACCACCAGCCCCAGCAAGAGCACAAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCCACCACACTGA - ATCTCCCCTCCTCACAGTTGCCATGTAGACCCCTTGAAGAGGGGAGGGGCCTAGG  1280 

AAGACCCCTGGACCACCAGCCCCAGTAAGAGCACAAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCCACCACACTGA - ATCTCCCCTCCTCACAGTTTCCATGTAGACCCCTTGAAGAGGGGAGGGGCCTAGG 

GGGACTCCTGGACCACCAACTCCAGCGAGAGCACAAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGTTCCTGCCACACTCAGTCCCCCACCACACT - CCCCTCCTCACAGTTTCCATGCAGACCACTTGAAGAGGG-AGGGACCTAGG 

AAGACCCCTGAACCACCAGCCCCAGTGACAGCACAAGAGGAAAAGAGAGGCCCTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCTGCCACAATGA - GAATCTCCCCTCCTCACAGTTTCCATGCAGACCCCCTGAAGAGGG-AGTGGCCTAGG 

AAGACCCCCTGACCACCAGCCCCAGCGAGAGCACCAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCCACCACACTAACTAAGAATCACCCCTCCTCACAGTTTCCATGCAGACACCCTGAAGAGGG-AGGGGCCTAGG 

AAGACCCCCAGACCACCAGCCCCAGTGAGAGCACAAGAGGAAGAGAGAGGCCTTCACTGCTGGGGG . . ACACTTAGTCCCCCTCCACACTGA - GAATCCCCCCTCCTCAGAGTTTCCATGCAGATCTCCTGAAGAGGG-AGGGGCCTAGG 

AAGACCCCTGGACCACCAGCCTCAGCGAGAGCACAGGAGGAAGAAAGAGGCCCTCACTGTTGGGGAGTCCCTGGCACACTCA-TTCCCCACCACACTGA - ATCTTCCCTCCTCACACATTCCATGCACACCCCTTGAAGAGGG-AGGGGCCTAGG 

AAGACCCCTGGACCACCAGCC-CAGCGACAGCATGAGAGGAAGACAGAAGCTCTTACTGCTGGGGAGTCCCTGCCATAATCAGTCCCCCACCAAGCTGA - ATCTCCCCTCCTCACAGTTTCCATGCAGACCCCTTGGAGAGGG-AGGGGCCTAGG 


hGAPDH 

ChrX 

ChrX2 

Chr6-1 

Chr5 

Chrl3 

Chr6-2 

Chrl5 


GAGCCGCACCTTGTCATGTACCATCAATAAAGTACCCTGTGCTCAACC  1310 

GAGCCCCACCTTGTCATATACCATCAATAAAGTACCCTGTGCTCAGCC 

GAGCCCCACCTTGTCGTGTACCATCAGTAAAGTCCCCTGTGCTCAGCC 

GAGCCCCACCTTGTCTTGTACCATCAACAAAGTCCCCTGTGCTCAGCC 

GCGCCCCACCTTGTCATGTACCATCAATAAAGTCCCCTGTGCTCAGCC 

GAGCCCCACCTTGTCATGTACCATCAATAAAGTCCCCTGTGCTCA - 

GAGCCCCATGTTGTCGTGTACCATCAATAAAGTCCCCTGTGCTCAGCC 

GTGCCCCACCTTGTCATGTACCATCAATAAAGTGCCTTGTGCTCAGCC 


Figure  5.  Alignment  of  the  human  GAPDH  mRNA  with  seven  PGs  that  are  the  best  homologous  to  the  wild  type  GAPDH  mRNA 
shows  that  the  first  26  nt  of  the  mRNA  is  the  only  unique  region  while  the  685-705th  nt  region  of  the  GAPDH  has  the  most 
mismatches  (shown  as  underlined  lowercase  letters  in  PGs).  We  select  the  6-25th  and  the  685-705th  regions  as  the  forward  and  reverse 
primers  (underlined),  respectively. 
doi:1 0.1 371/journal.pone.0041 659.g005 


primers,  to  increase  the  specificity.  This  strategy  should  also 
enhance  the  preference  to  the  authentic  cDNA  if,  like  the  two 
primers,  the  probe  is  also  assigned  to  a  region  containing 
mismatches.  However,  it  remains  possible  that  like  the  two 
primers,  the  probe  also  mis-anneals  with  some  PGs,  due  to  the 
high  sequence  similarity.  It  also  merits  mention  that  because 
mouse  genome  varies  hugely  among  different  strains  [37],  the  real 
risk  of  mis-priming  may  be  higher  in  the  cells  or  tissues  from  some 
strains  of  mice. 

Specificity  is  an  important  criterion  of  primer  design  for  PCR, 
which  hitherto  is  concerned  only  vertically:  appearance  of 
additional  band(s)  above  or  below  the  expected  one(s)  in  the  same 
lane  of  the  agarose  gel  is  indicative  of  non-specificity  of  the  primer 
pair,  as  seen  in  the  PCR  results  from  our  ACTB  and  Actb  primers 
(Fig.  7A,  7D  and  7E).  We  now  propose  to  consider  the  specificity 
also  horizontally  as  part  of  the  SOP:  gDNA  sample  should  be 
included  in  PCR  as  template  and  in  the  ensuing  gel  electropho¬ 


resis,  so  as  to  determine  whether  there  are  processed  PGs 
amplified.  This  is  needed  if  it  is  preferred  not  to  determine 
whether  a  PG  is  transcribed  in  the  to-be-studied  cell  type  or 
situation.  If  the  expected  band  appears  also  in  the  gDNA  samples, 
the  primers  need  to  be  redesigned  or  a  complete  DNase  digestion 
of  RNA  sample  needs  to  be  ensured.  As  an  example  of  the 
horizontal  criteria,  the  slight  difference  in  the  expected  band  of 
Actb  between  cDNA  and  gDNA  samples  is  discerned  when  they 
were  loaded  into  the  agarose  gel  in  a  side-by-side  manner,  i.e.  the 
band  from  gDNA  samples  is  fuzzy  and  seems  to  be  mixed  bands 
(Fig.  7D  and  7E),  suggesting  that  the  primers  may  not  be  specific 
horizontally,  although  they  were  selected  already  under  the  best 
consideration  of  PGs.  A  caveat  needs  to  be  given  that  it  is  generally 
more  difficult  to  perform  PCR  with  gDNA  as  template,  in  part 
because  long  chromatin  DNAs  are  highly  wound  and  difficult  to 
be  denatured  by  heating  and  annealed  by  primers.  This  may  be 
one  of  the  reasons  why  results  from  gDNA  samples  may  differ 
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Chr4-1  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chr 15  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
mGAPDH  AGAGACGGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA  160 
ChrX  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chr5  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chrll  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chr2  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chr4-2  AGAGACAGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCATTTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGA 
Chr4-1  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
ChrlS  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
mGAPDH  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT  320 
ChrX  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
Chr5  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
Chrll  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
Chr2  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 
Chr 4 -2  CCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTCAAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCT 

Chr4-1  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
ChrlS  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
mGAPDH  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA  480 
ChrX  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
Chr5  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGATTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
Chrll  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
Chr 2  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAACGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 
Chr4-2  GAGTATGTCGTGGAGTCTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGTGAACCACGAGAAATATGACAACTCACTCAAGA 

Chr4-1  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 
ChrlS  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGTGTGATGG 
mGAPDH  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG  640 
ChrX  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 
Chr5  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 
Chrll  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATCGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 
Chr2  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 
Chr4-2  TTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAATTTTGGCATTGTGGAAGGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGG 


Chr4-1  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
ChrlS  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
mGAPDH  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT  800 
ChrX  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
ChrS  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
Chrll  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
Chr2  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 
Chr4-2  CCGTGGGGCTGCCCAGAACATCATCCCTGCATCCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTGCCGCCTGGAGAAACCT 

Chr4-1  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 
Chr  15  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 

mGAPDH  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA  960 
ChrX  GCCAAGGATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 
Chr5  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 
Chrll  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 
Chr2  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 
Chr 4 -2  GCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGCGACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACA 

Chr4-1  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAGAGGCCCTATCCCAACTCGGCCCC 
ChrlS  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAGAGGCCCTATCCCAACTCGGCCCC 
mGAPDH  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAG — GCCCTATCCCAACTCGGCCCC  1120 
ChrX  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAGAGGCCCTATCCCAACTCGGCCCC 
Chr 5  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAG — GCCCTATCCCAACTCGGCCCC 

Chrll  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAG — GCCCTATCCCAACTCGGCCCC 

Chr2  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGA— GGCCCTATCCCAACTCGGCCCC 

Chr4-2  ACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCAACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAAAGGCCCTATCCCAACTCGGCCCC 

Chr4-1  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 
Chr 15  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 

mGAPDH  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCAC  1232 
ChrX  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCAC 
ChrS  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 
Chrll  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 
Chr2  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 
Chr 4 -2  CAACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCA- 

Figure  6.  Alignment  of  the  mouse  Gapdh  mRNA  with  seven  PGs  that  are  the  best  homologous  to  the  Gapdh  mRNA  shows  that 
these  PGs  are  almost  identical  to  the  Gapdh  with  only  several  mismatches.  No  region  of  the  Gapdh  can  be  used  as  a  primer  that  can 
significantly  discriminate  against  any  of  the  PGs. 
doi:1 0.1 371/journal.pone.0041 659.g006 

among  different  cell  lines  for  our  ACTB  and  GAPDH  primers  and  nucleotides  in  a  gel.  For  instance,  one  more  copy  of  a  246-bp 

why  our  ACTB  primers,  both  of  which  locate  at  the  same  exon  double-stranded  DNA  fragment  only  adds  492  nucleotides  in  the 

(exon  6),  seem  to  amplify  cDNA  samples  more  easily  than  gDNA  gel,  which  is  not  as  visible  as  one  more  copy  of  a  1-kb  DNA 

samples  (Fig.  7A).  PCR  results  of  the  ACTB  (Actb)  and  GAPDH  fragment  that  adds  2,000  nucleotides.  For  this  technical  reason,  we 

(Gapdh)  shown  in  the  literature,  including  those  from  us  usually  set,  if  possible,  primer  pairs  to  amplify  DNA  fragments  of 

previously,  may  not  be  specific  if  evaluated  horizontally.  Indeed,  400-700  bp  and  to  minimize  the  difference  in  the  fragment  sizes 

we  randomly  performed  bioinformatic  analyses  of  quite  a  few  between  the  reference  gene  and  the  interested  gene. 

PCR  primers  of  these  genes  reported  in  the  three  most  prestigious  In  summary,  in  this  study  we  provide  bioinformatic  data 

journals,  i.e.  Nature,  Science  and  Cell;  without  exception,  all  those  showing  that  the  genes  encoding  (3-actin  and  glyceraldehyde-3- 

primers  match  well  and  amplify  the  corresponding  PGs  by  insilico  phosphate  dehydrogenase  have  many  PGs  in  the  human  and 

PCR.  Whether  or  not  the  previously  published  RT-PCR  results  mouse  genomes.  These  PGs  may  affect  the  fidelity  of  ACTB  (Actb) 

that  involve  these  genes  (especially  the  Gapdh)  as  the  reference  or  GAPDH  (Gapdh)  as  a  reference  in  RT-PCR  by  their  genomic 

need  to  be  reevaluated  or  reinterpreted  is  an  uncomfortable  but  DNA  or,  if  some  of  them  are  expressed,  by  their  RNA  transcript, 

unavoidable  question,  and  should  be  left  to  the  corresponding  because  a  large  copy  number  in  the  genome  may  amplify  the 

authors  to  decide  accordingly.  artifact  derived  from  genomic  DNA  residual  in  the  RNA  sample 

An  additional  reason  to  abandon  ACTB  is  that  the  DNA  during  PCR  whereas  their  RNA  transcript  may  contribute  to  the 

fragment  amplified  by  our  primer  pair,  which  is  the  only  one  we  yield  of  RT-PCR.  We  suggest  to  peers  to  forgo  these  genes, 

can  find  in  consideration  of  the  specificity  horizontally,  is  only  246-  especially  the  Gapdh,  as  a  reference  in  RT-PCR  or,  if  there  is  no 

bp  long  (table  2).  This  size  may  be  suitable  for  real-time  RT-PCR  suitable  surrogate,  to  use  with  extra  caution  our  primers  and  PCR 

but  is  too  short  to  sensitively  reflect  a  difference  in  the  copy  conditions  provided  herein  that  may  better  avoid  mis-priming 

numbers  of  the  ACTB  mRNA,  if  the  RT-PCR  products  are  PGs,  relative  to  most  primers  described  in  the  literature.  We  also 

evaluated  by  visualization  in  an  agarose  gel,  because  it  requires  propose  an  SOP  in  which  design  of  primers  for  RT-PCR  starts 

more  copies  of  a  small  DNA  fragment  to  reach  a  visible  amount  of  from  avoiding  mis-priming  PGs  and  all  primers  need  be  tested 
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Figure  7.  Determination  of  the  primer  specificity  for  PCR  both  vertically  and  horizontally.  A-C:  gDNA  (G)  and  cDNA  (C)  samples  from  a 
panel  of  human  cell  lines  (described  in  materials  and  methods)  were  amplified  by  PCR  with  conditions  of  initial  denature  at  95°C  for  5  min  and  40 
cycles  of  melting  at  95°C  for  30  sec,  primer-annealing  at  58°C  for  30  sec,  and  elongating  at  72°C  for  30  sec.  The  reaction  was  determined  at  72°C  for 
1 0  min.  M  is  molecular  weight  marker.  D  and  E:  gDNA  (G)  and  cDNA  (C)  samples  from  M8  and  ND5  mouse  cell  lines  were  amplified  by  PCR  under  the 
conditions  described  above.  Stars  indicate  the  authentic  Actb  cDNA  band  whereas  arrows  indicate  its  counterpart  from  gDNA  samples. 
doi:1 0.1 371/journal.pone.0041 659.g007 


with  not  only  cDNA  but  also  gDNA  to  ensure  their  specificity  both 
vertically  and  horizontally. 
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Figure  SI  ACTB,  Actb,  GADPH,  Gadph,  HPRT1  and  Hprt 
mRNA  sequences  pulled  out  from  the  NCBI  database  that 
presents  all  mRNA  as  DNA  sequence,  i.e.  with  thymine  replacing 
uracil. 

(DOC) 

Figure  S2  Putative  PGs  of  the  ACTB  identified  by  Blat  search 
using  the  ACTB  mRNA  sequence  (after  deletion  of  the  poly-A 
tail).  The  top  sequence  that  has  100%  identity  to  the  bait  is  the 
authentic  ACTB  gene  on  chromosome  7.  The  six  genomic  DNA 
fragments  that  have  the  highest  scores  to  the  bait  were  used  in  the 
alignment  with  the  bait  sequence  shown  in  figure  3. 

(DOC) 

Figure  S3  Putative  PGs  of  the  Actb  identified  by  Blat  search 
using  the  Actb  mRNA  sequence.  The  top  sequence  that  has  100% 
identity  to  the  bait  is  the  authentic  Actb  gene  on  mouse 
chromosome  5.  The  six  genomic  DNA  fragments  in  the  red  box 
that  have  the  highest  scores  to  the  bait  were  used  in  the  alignment 
with  the  bait  sequence  shown  in  figure  4. 

(DOC) 


Figure  S4  Putative  PGs  of  the  GAPDH  identified  by  Blat  search 
using  the  GAPDH  mRNA  sequence  (after  deletion  of  the  poly-A 
tail).  The  top  sequence  that  has  100%  identity  to  the  bait  is  the 
authentic  GAPDH  gene  on  human  chromosome  12.  The  seven 
genomic  DNA  fragments  in  the  red  box  that  have  the  highest 
scores  to  the  bait  were  used  in  the  alignment  with  the  bait 
sequence  shown  in  figure  5. 

(DOG) 

Figure  S5  Putative  PGs  of  the  Gapdh  identified  by  Blat  search 
using  the  Gapdh  mRNA  sequence  (after  deletion  of  the  poly-A 
tail).  The  top  sequence  that  has  100%  identity  to  the  bait  is  the 
authentic  Gapdh  gene  on  the  mouse  chromosome  7.  The  seven 
genomic  DNA  fragments  in  the  red  box  that  have  the  highest 
scores  to  the  bait  were  used  in  the  alignment  with  the  bait 
sequence  shown  in  figure  6. 

(DOG) 
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Human  genes 

Human  ACTB  mRNA: 

>gi | 168480144 | ref | NM_001101 . 3 |  Homo  sapiens  actin,  beta  (ACTB),  mRNA: 

ACCGCCGAGACCGCGTCCGCCCCGCGAGCACAGAGCCTCGCCTTTGCCGATCCGCCGCCCGTCCACACCCGCCGCCAGCTCACCATGGATGATGATATCGCCGCGCTCGTCG 
TCGACAACGGCTCCGGCATGTGCAAGGCCGGCTTCGCGGGCGACGATGCCCCCCGGGCCGTCTTCCCCTCCATCGTGGGGCGCCCCAGGCACCAGGGCGTGATGGTGGGCAT 
GGGTCAGAAGGATTCCTATGTGGGCGACGAGGCCCAGAGCAAGAGAGGCATCCTCACCCTGAAGTACCCCATCGAGCACGGCATCGTCACCAACTGGGACGACATGGAGAAA 
ATCTGGCACCACACCTTCTACAATGAGCTGCGTGTGGCTCCCGAGGAGCACCCCGTGCTGCTGACCGAGGCCCCCCTGAACCCCAAGGCCAACCGCGAGAAGATGACCCAGA 
TCATGTTTGAGACCTTCAACACCCCAGCCATGTACGTTGCTATCCAGGCTGTGCTATCCCTGTACGCCTCTGGCCGTACCACTGGCATCGTGATGGACTCCGGTGACGGGGT 
CACCCACACTGTGCCCATCTACGAGGGGTATGCCCTCCCCCATGCCATCCTGCGTCTGGACCTGGCTGGCCGGGACCTGACTGACTACCTCATGAAGATCCTCACCGAGCGC 
GGCTACAGCTTCACCACCACGGCCGAGCGGGAAATCGTGCGTGACATTAAGGAGAAGCTGTGCTACGTCGCCCTGGACTTCGAGCAAGAGATGGCCACGGCTGCTTCCAGCT 
CCTCCCTGGAGAAGAGCTACGAGCTGCCTGACGGCCAGGTCATCACCATTGGCAATGAGCGGTTCCGCTGCCCTGAGGCACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTC 
CTGTGGCATCCACGAAACTACCTTCAACTCCATCATGAAGTGTGACGTGGACATCCGCAAAGACCTGTACGCCAACACAGTGCTGTCTGGCGGCACCACCATGTACCCTGGC 
ATTGCCGACAGGATGCAGAAGGAGATCACTGCCCTGGCACCCAGCACAATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGGATCGGCGGCTCCATCC 
TGGCCTCGCTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAGTATGACGAGTCCGGCCCCTCCATCGTCCACCGCAAATGCTTCTAGGCGGACTATGACTTAGTTGC 
GTTACACCCTTTCTTGACAAAACCTAACTTGCGCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGTTTTTTTTGTTTTGTTTTGGTTTTTTTTTTTTTTTTGGCTTGAC 
TCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTCACAATGTGGCCGAGGACTTTGATTGCACATTGTTGTTTTTTTAAT 
AGTCATTCCAAATATGAGATGCGTTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGG 
AGGTGATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCCCCT 
TCCCCCTTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAA 
AGT GC AC AC C T T AAAAAT GAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 


Human  GAPDH  mRNA: 

>gi | 83641890 | ref | NM_002046 . 4 |  Homo  sapiens  glyceraldehyde-3-phosphate  dehydrogenase  (GAPDH),  mRNA: 

GGCTGGGACTGGCTGAGCCTGGCGGGAGGCGGGGTCCGAGTCACCGCCTGCCGCCGCGCCCCCGGTTTCTATAAATTGAGCCCGCAGCCTCCCGCTTCGCTCTCTGCTCCTC 

CTGTTCGACAGTCAGCCGCATCTTCTTTTGCGTCGCCAGCCGAGCCACATCGCTCAGACACCATGGGGAAGGTGAAGGTCGGAGTCAACGGATTTGGTCGTATTGGGCGCCT 

GGTCACCAGGGCTGCTTTTAACTCTGGTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTTTACATGTTCCAATATGATTCCACCCATGGC 

AAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGGAGCGAGATCCCTCCAAAATCAAGTGGGGCGATGCTGGCG 

CTGAGTACGTCGTGGAGTCCACTGGCGTCTTCACCACCATGGAGAAGGCTGGGGCTCATTTGCAGGGGGGAGCCAAAAGGGTCATCATCTCTGCCCCCTCTGCTGATGCCCC 

CATGTTCGTCATGGGTGTGAACCATGAGAAGTATGACAACAGCCTCAAGATCATCAGCAATGCCTCCTGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGAC 

AACTTTGGTATCGTGGAAGGACTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCCGGGAAACTGTGGCGTGATGGCCGCGGGGCTCTCC 

AGAACATCATCCCTGCCTCTACTGGCGCTGCCAAGGCTGTGGGCAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTCCCCACTGCCAACGTGTC 

AGTGGTGGACCTGACCTGCCGTCTAGAAAAACCTGCCAAATATGATGACATCAAGAAGGTGGTGAAGCAGGCGTCGGAGGGCCCCCTCAAGGGCATCCTGGGCTACACTGAG 

CACCAGGTGGTCTCCTCTGACTTCAACAGCGACACCCACTCCTCCACCTTTGACGCTGGGGCTGGCATTGCCCTCAACGACCACTTTGTCAAGCTCATTTCCTGGTATGACA 

ACGAATTTGGCTACAGCAACAGGGTGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGTAAGACCCCTGGACCACCAGCCCCAGCAAGAGCACAAGAGGAAGAGAGAGACC 

CTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCCACCACACTGAATCTCCCCTCCTCACAGTTGCCATGTAGACCCCTTGAAGAGGGGAGGGGCCTAGGGAGCCGCACC 

TT GT C AT GT AC C AT C AAT AAAGT AC C C T GT GC T C AAC C AAAAAAAAAAAAAAAAAAA 

Human  GAPDH  variant  2;  NM_0012567 99 . 1 : 

GTGCGCAGCGGGTGCATCCCTGTCCGGATGCTGCGCCTGCGGTAGAGCGGCCGCCATGTTGCAACCGGGAAGGAAATGAATGGGCAGCCGTTAGGAAAGCCTGCCGGTGACT 

AACCCTGCGCTCCTGCCTCGATGGGTGGAGTCGCGTGTGGCGGGGAAGTCAGGTGGAGCGAGGCTAGCTGGCCCGATTTCTCCTCGGGTGATGCTTTTCCTAGATTATTCTC 

TGATTTGGTCGTATTGGGCGCCTGGTCACCAGGGCTGCTTTTAACTCTGGTAAAGTGGATATTGTTGCCATCAATGACCCCTTCATTGACCTCAACTACATGGTTTACATGT 

TCCAATATGATTCCACCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTCATCAATGGAAATCCCATCACCATCTTCCAGGAGCGAGATCCCTCCAA 

AATCAAGTGGGGCGATGCTGGCGCTGAGTACGTCGTGGAGTCCACTGGCGTCTTCACCACCATGGAGAAGGCTGGGGCTCATTTGCAGGGGGGAGCCAAAAGGGTCATCATC 

TCTGCCCCCTCTGCTGATGCCCCCATGTTCGTCATGGGTGTGAACCATGAGAAGTATGACAACAGCCTCAAGATCATCAGCAATGCCTCCTGCACCACCAACTGCTTAGCAC 

CCCTGGCCAAGGTCATCCATGACAACTTTGGTATCGTGGAAGGACTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCCGGGAAACTGTG 

GCGTGATGGCCGCGGGGCTCTCCAGAACATCATCCCTGCCTCTACTGGCGCTGCCAAGGCTGTGGGCAAGGTCATCCCTGAGCTGAACGGGAAGCTCACTGGCATGGCCTTC 

CGTGTCCCCACTGCCAACGTGTCAGTGGTGGACCTGACCTGCCGTCTAGAAAAACCTGCCAAATATGATGACATCAAGAAGGTGGTGAAGCAGGCGTCGGAGGGCCCCCTCA 

AGGGCATCCTGGGCTACACTGAGCACCAGGTGGTCTCCTCTGACTTCAACAGCGACACCCACTCCTCCACCTTTGACGCTGGGGCTGGCATTGCCCTCAACGACCACTTTGT 

CAAGCTCATTTCCTGGTATGACAACGAATTTGGCTACAGCAACAGGGTGGTGGACCTCATGGCCCACATGGCCTCCAAGGAGTAAGACCCCTGGACCACCAGCCCCAGCAAG 

AGCACAAGAGGAAGAGAGAGACCCTCACTGCTGGGGAGTCCCTGCCACACTCAGTCCCCCACCACACTGAATCTCCCCTCCTCACAGTTGCCATGTAGACCCCTTGAAGAGG 

GGAGGGGCCTAGGGAGCCGCACCTTGTCATGTACCATCAATAAAGTACCCTGTGCTCAACCAAAAAAAAAAAAAAAAAAA 

Human  HPRT1  mRNA: 

>gi | 1 64518 913 | ref | NM_0001 94 . 2 |  Homo  sapiens  hypoxanthine  phosphoribosyltransferase  1  (HPRT1) ,  mRNA: 

GGCGGGGCCTGCTTCTCCTCAGCTTCAGGCGGCTGCGACGAGCCCTCAGGCGAACCTCTCGGCTTTCCCGCGCGGCGCCGCCTCTTGCTGCGCCTCCGCCTCCTCCTCTGCT 

CCGCCACCGGCTTCCTCCTCCTGAGCAGTCAGCCCGCGCGCCGGCCGGCTCCGTTATGGCGACCCGCAGCCCTGGCGTCGTGATTAGTGATGATGAACCAGGTTATGACCTT 

GATT TAT TTTGCATACCTAATCATTATGCTGAGGATTTGGAAAGGGTGTTTATTCCTCATGGACTAATTATGGACAGGACTGAACGTCTTGCTCGAGATGTGATGAAGGAGA 

TGGGAGGCCATCACATTGTAGCCCTCTGTGTGCTCAAGGGGGGCTATAAATTCTTTGCTGACCTGCTGGATTACATCAAAGCACTGAATAGAAATAGTGATAGATCCATTCC 

TATGACTGTAGATTTTATCAGACTGAAGAGCTATTGTAATGACCAGTCAACAGGGGACATAAAAGTAATTGGTGGAGATGATCTCTCAACTTTAACTGGAAAGAATGTCTTG 

ATTGTGGAAGATATAATTGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCCAAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACCC 

CACGAAGTGTTGGATATAAGCCAGACTTTGTTGGATTTGAAATTCCAGACAAGTTTGTTGTAGGATATGCCCTTGACTATAATGAATACTTCAGGGATTTGAATCATGTTTG 

TGTCATTAGTGAAACTGGAAAAGCAAAATACAAAGCCTAAGATGAGAGTTCAAGTTGAGTTTGGAAACATCTGGAGTCCTATTGACATCGCCAGTAAAATTATCAATGTTCT 

AGTTCTGTGGCCATCTGCTTAGTAGAGCTTTTTGCATGTATCTTCTAAGAATTTTATCTGTTTTGTACTTTAGAAATGTCAGTTGCTGCATTCCTAAACTGTTTATTTGCAC 


TATGAGCCTATAGACTATCAGTTCCCTTTGGGCGGATTGTTGTTTAACTTGTAAATGAAAAAATTCTCTTAAACCACAGCACTATTGAGTGAAACATTGAACTCATATCTGT 
AAGAAAT AAAGAGAAGAT AT AT  TAGT T  T  T  T  TAAT T GGT AT  T  T  TAAT T  T  T  T AT AT AT GC AGGAAAGAAT AGAAGT GAT T GAAT AT T GT  T AAT TAT AC C AC C GT GT GT  TAGAAA 
AGTAAGAAGCAGTCAATTTTCACATCAAAGACAGCATCTAAGAAGTTTTGTTCTGTCCTGGAATTATTTTAGTAGTGTTTCAGTAATGTTGACTGTATTTTCCAACTTGTTC 
AAAT  TAT  T  AC  C  AGT  GAAT  C  T  T  T  GT  C  AGC  AG  TTCCCTTT  TAAATGCAAAT  C  AAT  AAAT  T  C  C  CAAAAAT  T  TAAAAAAAAAAAAAAAAAAAAAA 

Mouse  genes 

Mouse  Actb  mRNA: 

>gi | 145966868 | ref | NM_007393 . 3 |  Mus  musculus  actin,  beta  (Actb),  mRNA: 

CTGTCGAGTCGCGTCCACCCGCGAGCACAGCTTCTTTGCAGCTCCTTCGTTGCCGGTCCACACCCGCCACCAGTTCGCCATGGATGACGATATCGCTGCGCTGGTCGTCGAC 
AACGGCTCCGGCATGTGCAAAGCCGGCTTCGCGGGCGACGATGCTCCCCGGGCTGTATTCCCCTCCATCGTGGGCCGCCCTAGGCACCAGGGTGTGATGGTGGGAATGGGTC 
AGAAGGACTCCTATGTGGGTGACGAGGCCCAGAGCAAGAGAGGTATCCTGACCCTGAAGTACCCCATTGAACATGGCATTGTTACCAACTGGGACGACATGGAGAAGATCTG 
GCACCACACCTTCTACAATGAGCTGCGTGTGGCCCCTGAGGAGCACCCTGTGCTGCTCACCGAGGCCCCCCTGAACCCTAAGGCCAACCGTGAAAAGATGACCCAGATCATG 
TTTGAGACCTTCAACACCCCAGCCATGTACGTAGCCATCCAGGCTGTGCTGTCCCTGTATGCCTCTGGTCGTACCACAGGCATTGTGATGGACTCCGGAGACGGGGTCACCC 
ACACTGTGCCCATCTACGAGGGCTATGCTCTCCCTCACGCCATCCTGCGTCTGGACCTGGCTGGCCGGGACCTGACAGACTACCTCATGAAGATCCTGACCGAGCGTGGCTA 
CAGCTTCACCACCACAGCTGAGAGGGAAATCGTGCGTGACATCAAAGAGAAGCTGTGCTATGTTGCTCTAGACTTCGAGCAGGAGATGGCCACTGCCGCATCCTCTTCCTCC 
CTGGAGAAGAGCTATGAGCTGCCTGACGGCCAGGTCATCACTATTGGCAACGAGCGGTTCCGATGCCCTGAGGCTCTTTTCCAGCCTTCCTTCTTGGGTATGGAATCCTGTG 
GCATCCATGAAACTACATTCAATTCCATCATGAAGTGTGACGTTGACATCCGTAAAGACCTCTATGCCAACACAGTGCTGTCTGGTGGTACCACCATGTACCCAGGCATTGC 
TGACAGGATGCAGAAGGAGATTACTGCTCTGGCTCCTAGCACCATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCTGTGTGGATCGGTGGCTCCATCCTGGCC 
TCACTGTCCACCTTCCAGCAGATGTGGATCAGCAAGCAGGAGTACGATGAGTCCGGCCCCTCCATCGTGCACCGCAAGTGCTTCTAGGCGGACTGTTACTGAGCTGCGTTTT 
AC AC C C T T T C T T T GAC AAAAC C T AAC T T GC GC AGAAAAAAAAAAAAT AAGAGAC AAC AT T GGC AT GGC T T T GT T T T T T T AAAT T T T T T T T AAAGT T T T T T T T T T T T T T T T T T 
TTTTTTTTTTTAAGTTTTTTTGTTTTGTTTTGGCGCTTTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGCGACAGCAGTTGGTTGGAGCAAACATCCCCCAAAGTTCTA 
CAAATGTGGCTGAGGACTTTGTACATTGTTTTGTTTTTTTTTTTTTTTGGTTTTGTCTTTTTTTAATAGTCATTCCAAGTATCCATGAAATAAGTGGTTACAGGAAGTCCCT 
CACCCTCCCAAAAGCCACCCCCACTCCTAAGAGGAGGATGGTCGCGTCCATGCCCTGAGTCCACCCCGGGGAAGGTGACAGCATTGCTTCTGTGTAAATTATGTACTGCAAA 
^TTTTTTT;^TCTTCCGCCTT^TACTTCATTTTTGTTTTT^TTTCTG^TGGCCCAGGTCTGAGGCCTCCCTTTTTTTTGTCCCCCC^CTTGATGTATGAAGGCTTT 
GGTCTCCCTGGGAGGGGGTTGAGGTGTTGAGGCAGCCAGGGCTGGCCTGTACACTGACTTGAGACCAATAAAAGTGCACACCTTACCTTACACAAAC 

Mouse  Gapdh  mRNA: 

>gi | 12 6012538 | ref | NM_008084 . 2 |  Mus  musculus  glyceraldehyde-3-phosphate  dehydrogenase  (Gapdh),  mRNA: 

AGAGACGGCCGCATCTTCTTGTGCAGTGCCAGCCTCGTCCCGTAGACAAAATGGTGAAGGTCGGTGTGAACGGATTTGGCCGTATTGGGCGCCTGGTCACCAGGGCTGCCAT 

TTGCAGTGGCAAAGTGGAGATTGTTGCCATCAACGACCCCTTCATTGACCTCAACTACATGGTCTACATGTTCCAGTATGACTCCACTCACGGCAAATTCAACGGCACAGTC 

AAGGCCGAGAATGGGAAGCTTGTCATCAACGGGAAGCCCATCACCATCTTCCAGGAGCGAGACCCCACTAACATCAAATGGGGTGAGGCCGGTGCTGAGTATGTCGTGGAGT 

CTACTGGTGTCTTCACCACCATGGAGAAGGCCGGGGCCCACTTGAAGGGTGGAGCCAAAAGGGTCATCATCTCCGCCCCTTCTGCCGATGCCCCCATGTTTGTGATGGGTGT 

GAACCACGAGAAATATGACAACTCACTCAAGATTGTCAGCAATGCATCCTGCACCACCAACTGCTTAGCCCCCCTGGCCAAGGTCATCCATGACAACTTTGGCATTGTGGAA 

GGGCTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACTGTGGATGGCCCCTCTGGAAAGCTGTGGCGTGATGGCCGTGGGGCTGCCCAGAACATCATCCCTGCAT 

CCACTGGTGCTGCCAAGGCTGTGGGCAAGGTCATCCCAGAGCTGAACGGGAAGCTCACTGGCATGGCCTTCCGTGTTCCTACCCCCAATGTGTCCGTCGTGGATCTGACGTG 

CCGCCTGGAGAAACCTGCCAAGTATGATGACATCAAGAAGGTGGTGAAGCAGGCATCTGAGGGCCCACTGAAGGGCATCTTGGGCTACACTGAGGACCAGGTTGTCTCCTGC 

GACTTCAACAGCAACTCCCACTCTTCCACCTTCGATGCCGGGGCTGGCATTGCTCTCAATGACAACTTTGTCAAGCTCATTTCCTGGTATGACAATGAATACGGCTACAGCA 

ACAGGGTGGTGGACCTCATGGCCTACATGGCCTCCAAGGAGTAAGAAACCCTGGACCACCCACCCCAGCAAGGACACTGAGCAAGAGAGGCCCTATCCCAACTCGGCCCCCA 

ACACTGAGCATCTCCCTCACAATTTCCATCCCAGACCCCCATAATAACAGGAGGGGCCTAGGGAGCCCTCCCTACTCTCTTGAATACCATCAATAAAGTTCGCTGCACCCAC 

AAAAAAAAAAAAAAAAAAAAAA 


Mouse  Hprt  mRNA: 

>gi | 96975137 | ref | NM_013556 . 2 |  Mus  musculus  hypoxanthine  guanine  phosphoribosyl  transferase  (Hprt),  mRNA: 

GGAGCCTGGCCGGCAGCGTTTCTGAGCCATTGCTGAGGCGGCGAGGGAGAGCGTTGGGCTTACCTCACTGCTTTCCGGAGCGGTAGCACCTCCTCCGCCGGCTTCCTCCTCA 

GACCGCTTTTTGCCGCGAGCCGACCGGTCCCGTCATGCCGACCCGCAGTCCCAGCGTCGTGATTAGCGATGATGAACCAGGTTATGACCTAGATTTGTTTTGTATACCTAAT 

CATTATGCCGAGGATTTGGAAAAAGTGTTTATTCCTCATGGACTGATTATGGACAGGACTGAAAGACTTGCTCGAGATGTCATGAAGGAGATGGGAGGCCATCACATTGTGG 

CCCTCTGTGTGCTCAAGGGGGGCTATAAGTTCTTTGCTGACCTGCTGGATTACATTAAAGCACTGAATAGAAATAGTGATAGATCCATTCCTATGACTGTAGATTTTATCAG 

ACTGAAGAGCTACTGTAATGATCAGTCAACGGGGGACATAAAAGTTATTGGTGGAGATGATCTCTCAACTTTAACTGGAAAGAATGTCTTGATTGTTGAAGATATAATTGAC 

ACTGGTAAAACAATGCAAACTTTGCTTTCCCTGGTTAAGCAGTACAGCCCCAAAATGGTTAAGGTTGCAAGCTTGCTGGTGAAAAGGACCTCTCGAAGTGTTGGATACAGGC 

CAGACTTTGTTGGATTTGAAATTCCAGACAAGTTTGTTGTTGGATATGCCCTTGACTATAATGAGTACTTCAGGGATTTGAATCACGTTTGTGTCATTAGTGAAACTGGAAA 

AGCCAAATACAAAGCCTAAGATGAGCGCAAGTTGAATCTGCAAATACGAGGAGTCCTGTTGATGTTGCCAGTAAAATTAGCAGGTGTTCTAGTCCTGTGGCCATCTGCCTAG 

TAAAGCTTTTTGCATGAACCTTCTATGAATGTTACTGTTTTATTTTTAGAAATGTCAGTTGCTGCGTCCCCAGACTTTTGATTTGCACTATGAGCCTATAGGCCAGCCTACC 

CTCTGGTAGATTGTCGCTTATCTTGTAAGAAAAACAAATCTCTTAAATTACCACTTTTAAATAATAATACTGAGATTGTATCTGTAAGAAGGATTTAAAGAGAAGCTATATT 

AGTTTTTTAATTGGTATTTTAATTTTTATATATTCAGGAGAGAAAGATGTGATTGATATTGTTAATTTAGACGAGTCTGAAGCTCTCGATTTCCTATCAGTAACAGCATCTA 

AGAGGTTTTGCTCAGTGGAATAAACATGTTTCAGCAGTGTTGGCTGTATTTTCCCACTTTCAGTAAATCGTTGTCAACAGTTCCTTTTAAATGCAAATAAATAAATTCTAAA 

AATTC 

Figure  SI. 
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12 

+ 

63148933 

63150512 

1580 

443 

617 

1302 

1310 

88.1% 

5 

+ 

35081321 

35082293 

973 

903 

96 

1310 

13  10 

88.3% 

10 

57426899 

57428117 

1219 

410 

749 

1295 

1310 

♦V* 

<0 

oo 

CO 

10 

15135073 

15135612 

540 

878 

96 

1295 

13  10 

88.9% 

8 

101562781 

101563948 

1168 

408 

812 

1310 

1310 

90.9% 

2 

+ 

188280277 

188280774 

498 

“ 

4 

3  62 

861 

1308 

1310 

91.4% 

15494147 

15494597 

451 

876 

83 

1310 

13  10 

87. 6% 

2 

+ 

38512533 

38513736 

1204 

338 

197 

1289 

1310 

80.8% 

Y 

+ 

23023315 

23024408 

1094 

874 

144 

1278 

13  10 

90.3% 

6 

135940134 

135941268 

1135 

324 

280 

1157 

1310 

81.8% 

4 

131424463 

131425328 

866 

83  6 

53 

1293 

13  10 

85.3% 

1 

- 

120076188 

120139823 

63  63  6 

316 

151 

662 

1310 

82 . 1% 

15 

+ 

44646939 

44647442 

504 

832 

28 

1295 

13  10 

84.7% 

1 

+ 

215044007 

215045249 

1243 

308 

440 

1178 

1310 

81.8% 

3 

+ 

179930237 

179930972 

73  6 

83  1 

27 

1286 

13  10 

85. 6% 

22 

- 

41069312 

41070526 

1215 

290 

471 

894 

1310 

84.4% 

1 

_ 

120076588 

120077008 

421 

817 

97 

1295 

13  10 

86.2% 

15 

+ 

44355703 

44648084 

292382 

289 

904 

1302 

1310 

86.4% 

1 

_ 

120138597 

120138996 

400 

801 

141 

1255 

13  10 

89. 1% 

1 

- 

92580209 

92581324 

1116 

278 

980 

1310 

1310 

92.2% 

13 

- 

45697946 

45698277 

332 

793 

151 

1223 

13  10 

87. 1% 

10 

- 

93426422 

93427491 

1070 

275 

29 

493 

1310 

83.8% 

13 

- 

99842870 

99843315 

446 

788 

134 

12  67 

13  10 

87.2% 

18 

- 

3977496 

3978584 

1089 

237 

1002 

1310 

1310 

90.4% 

X 

- 

86680390 

86680694 

305 

769 

83 

1261 

13  10 

85.0% 

3 

- 

143221964 

143223135 

1172 

234 

1017 

1310 

1310 

90.1% 

X 

- 

15388930 

15389224 

295 

758 

130 

1278 

13  10 

85.8% 

1 

+ 

94767621 

94768751 

1131 

208 

30 

1295 

1310 

83 . 1% 

3 

+ 

141443152 

141443571 

420 

743 

127 

1302 

13  10 

87.8% 

1 

- 

52172598 

52173794 

1197 

741 

160 

1302 

13  10 

85. 1% 

9 

+ 

103738007 

103739132 

1126 

740 

149 

1309 

13  10 

83 . 6% 

6 

+ 

70455808 

70456947 

1140 

724 

163 

1310 

13  10 

85.2% 

12 

+ 

7719019 

7720174 

1156 

72  1 

150 

12  67 

13  10 

82.8% 

2 

- 

3735418 

3736530 

1113 

719 

157 

1302 

13  10 

86.5% 

1 

- 

32867509 

32868662 

1154 

Figure  S4: 


SCORE  START  END  QSIZE  IDENTITY  CHRO  STRAND  START 


END 


SPAN 


1229 

1 

1232 

1232 

100.0% 

7 

+ 

59002017 

59003250 

1234 

1228 

1 

1231 

1232 

100.0% 

4 

- 

91518178 

91519410 

1233 

1227 

1 

1232 

1232 

99.9% 

X 

- 

77439008 

77440241 

1234 

1227 

1 

1231 

1232 

99.9% 

2 

- 

28788949 

28790179 

1231 

1227 

1 

1231 

1232 

99.9% 

11 

- 

2023483  6 

20236066 

123  1 

1227 

1 

1231 

1232 

99.9% 

5 

+ 

96290683 

96291913 

1231 

1226 

1 

1231 

1232 

99.9% 

4 

- 

57383644 

57384876 

1233 

1226 

1 

1231 

1232 

99.9% 

15 

+ 

12504594 

12505826 

1233 

1223 

1 

1232 

1232 

99.7% 

11 

+ 

26686583 

26687816 

1234 

1220 

1 

1231 

1232 

99.6% 

3 

+ 

142625525 

142626757 

1233 

1219 

1 

1230 

1232 

99.6% 

6 

- 

83824193 

83825424 

1232 

1219 

1 

1232 

1232 

99. 6% 

Y  random 

+  25801587  25802820 

1218 

1 

1231 

1232 

99.6% 

8 

- 

95239645 

95240877 

1233 

1217 

12 

1229 

1232 

100.0% 

5 

- 

106291541 

106292760 

1220 

1216 

1 

1231 

1232 

99.5% 

2 

- 

11330476 

11331708 

1233 

1215 

1 

1232 

1232 

99.4% 

3 

+ 

139312293 

139313526 

1234 

1214 

1 

1231 

1232 

99.4% 

8 

+ 

60029049 

60030281 

1233 

1212 

1 

1231 

1232 

99.3% 

14 

- 

12113265 

12114497 

1233 

1211 

1 

1232 

1232 

99.2% 

5 

- 

13042251 

13043484 

1234 

1211 

11 

1232 

1232 

99.6% 

3 

- 

121932931 

121934154 

1224 

1209 

8 

1231 

1232 

99.5% 

8 

- 

77929236 

77930461 

1226 

1201 

1 

1232 

1232 

98.9% 

6 

- 

128151984 

128153301 

1318 

1199 

1 

1232 

1232 

98.7% 

1 

- 

15258698 

15259928 

123  1 

1199 

1 

1232 

1232 

98.8% 

15 

+ 

4835070 

4836303 

1234 

1198 

1 

1231 

1232 

98.8% 

7 

- 

107909138 

107910370 

1233 

1198 

11 

1231 

1232 

99.1% 

3 

+ 

52416717 

52417939 

1223 

1197 

1 

1231 

1232 

98.5% 

17 

+ 

14846013 

14847240 

1228 

1194 

1 

1231 

1232 

98.6% 

3 

+ 

78721130 

78722381 

1252 

1192 

1 

1232 

1232 

98.6% 

11 

- 

63125610 

63  12  6863 

1254 

1191 

1 

1232 

1232 

98.5% 

7 

- 

49380385 

49381623 

1239 

1190 

1 

1230 

1232 

98.5% 

15 

+ 

93026976 

93028228 

1253 

1186 

12 

1232 

1232 

98.5% 

7 

+ 

96756177 

96757388 

1212 

1186 

1 

1231 

1232 

98.3% 

6 

+ 

91378827 

91380059 

1233 

1185 

1 

1232 

1232 

98.2% 

2 

+ 

10413 1681 

104132914 

1234 

1185 

1 

1230 

1232 

98.0% 

14 

+ 

49028253 

49029473 

1221 

1184 

1 

1232 

1232 

98.2% 

3 

- 

61265599 

61266833 

1235 

1180 

1 

1232 

1232 

98.1% 

14 

- 

103769503 

103770713 

1211 

1175 

1 

1232 

1232 

97.8% 

5 

+ 

28865060 

28866310 

1251 

1174 

8 

1231 

1232 

98.5% 

13 

- 

99171850 

99173081 

1232 

1165 

1 

1232 

1232 

97.7% 

12 

- 

82472997 

82474258 

12  62 

1164 

1 

1232 

1232 

97.7% 

X 

+ 

148379862 

148381113 

1252 

1152 

11 

1229 

1232 

97. 6% 

12 

- 

71021259 

71022518 

12  60 

1152 

1 

1231 

1232 

97. 6% 

19 

+ 

40286822 

40288065 

1244 

1151 

1 

1232 

1232 

97.2% 

9 

- 

75541256 

75542516 

1261 

1150 

1 

1231 

1232 

97.4% 

2 

+ 

149568861 

149570099 

1239 

1143 

1 

1232 

1232 

96.7% 

X 

+ 

117274269 

117275527 

1259 

1142 

1 

1232 

1232 

96.7% 

4 

+ 

83750749 

83752009 

1261 

1140 

1 

1232 

1232 

96.8% 

18 

+ 

54143914 

54145183 

1270 

1139 

1 

1232 

1232 

97.8% 

2 

- 

43373802 

43374968 

1167 

1136 

12 

1232 

1232 

96.9% 

2 

+ 

151199705 

151200986 

1282 

1134 

1 

1228 

1232 

96.8% 

8 

- 

47339840 

47341082 

1243 

1133 

1 

1228 

1232 

97. 1% 

9 

+ 

51430799 

51432057 

1259 

1132 

1 

1232 

1232 

96.3% 

6 

+ 

84762184 

84763444 

1261 

1131 

1 

1228 

1232 

97.0% 

15 

+ 

84696700 

84697947 

1248 

1124 

1 

1232 

1232 

96.3% 

1 

- 

182257100 

182258364 

12  65 

1123 

1 

1228 

1232 

96.3% 

1 

+ 

188784337 

188785590 

1254 

1121 

1 

1232 

1232 

96.2% 

X 

- 

85817186 

85818448 

12  63 

1121 

1 

1232 

1232 

96.0% 

4 

+ 

82839842 

82841098 

1257 

1120 

1 

1232 

1232 

96. 6% 

13 

- 

49900113 

49901357 

1245 

1117 

12 

1232 

1232 

96.2% 

11 

+ 

99539401 

99540650 

1250 

1116 

12 

1232 

1232 

96.3% 

8 

- 

42693822 

42695062 

1241 

1115 

1 

1226 

1232 

96.5% 

2 

- 

146580255 

146581491 

1237 

1113 

1 

1230 

1232 

95.8% 

19 

- 

25870102 

25871347 

1246 

1112 

11 

1232 

1232 

96.4% 

X 

- 

112572299 

112573538 

1240 

1105 

1 

1232 

1232 

95.9% 

6 

+ 

22653394 

22654619 

1226 

1103 

11 

1232 

1232 

96.3% 

X 

- 

12902949 

12904185 

1237 

1101 

16 

1232 

1232 

95. 6% 

11 

- 

48547853 

48549098 

1246 

1096 

1 

1228 

1232 

95.9% 

11 

- 

17873595 

17874843 

1249 

1096 

9 

1232 

1232 

96.0% 

10 

+ 

39731630 

39732841 

1212 

1094 

14 

1228 

1232 

95.4% 

9 

- 

45643814 

45645055 

1242 

1093 

12 

1232 

1232 

95. 6% 

1 

- 

13775764 

13776999 

1236 

1093 

1 

1229 

1232 

95. 1% 

13 

+ 

94063686 

94064902 

1217 

1091 

11 

1232 

1232 

95.2% 

9 

- 

109732688 

109733935 

1248 

1091 

11 

1231 

1232 

95.4% 

12 

- 

5596224 

5597470 

1247 

1086 

11 

1232 

1232 

95.0% 

10 

- 

13610255 

13611499 

1245 

1085 

1 

1231 

1232 

94.5% 

9 

+ 

113439125 

113440557 

1433 

1081 

17 

1228 

1232 

95. 1% 

9 

- 

51810182 

51811420 

1239 

1075 

17 

1231 

1232 

95.3% 

12 

- 

13759277 

13760505 

1229 

(continuing  to  the  next  page) 


QSIZE 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 

1232 


IDENTITY 

CHRO 

STRAND  START 

END 

SPAN 

95.5% 

12 

- 

72947095 

72948317 

1223 

94.5% 

19 

+ 

16352237 

16353466 

1230 

95.9% 

8 

- 

37435491 

37436726 

1236 

95.2% 

10 

+ 

121606393 

121607622 

1230 

93 .8% 

2 

- 

162807788 

162809032 

1245 

93 . 6% 

16 

+ 

64911059 

64912307 

1249 

94.4% 

18 

+ 

56131890 

56133147 

1258 

94.5% 

10 

+ 

21008986 

21010231 

1246 

94. 1% 

2 

- 

161079989 

161081236 

1248 

93 . 1% 

14 

+ 

56868866 

56870107 

1242 

93 .4% 

10 

+ 

43541000 

43542237 

1238 

93 .5% 

7 

- 

80133834 

80135067 

1234 

93 . 1% 

12 

- 

108443078 

108444315 

1238 

94. 1% 

15 

- 

43048495 

43049742 

1248 

93 .0% 

X 

- 

101082734 

101084001 

12  68 

93 .2% 

7 

+ 

34187798 

34189037 

1240 

98. 6% 

10 

- 

3352812 1 

33529354 

1234 

93 .7% 

4 

+ 

41115634 

41116824 

1191 

92 . 1% 

7 

+ 

32664019 

32665265 

1247 

93 .7% 

6 

+ 

86234857 

86236081 

1225 

92 .0% 

7 

+ 

33389794 

33391040 

1247 

93 . 6% 

19 

- 

46748576 

46749820 

1245 

93 .4% 

6 

+ 

18094694 

18095929 

1236 

92 . 0% 

18 

+ 

9748681 

9749925 

1245 

92 . 6% 

11 

- 

109020300 

109021548 

1249 

92 .4% 

9 

+ 

37493479 

37494710 

1232 

91.9% 

9 

- 

49993948 

49995189 

1242 

92 .2% 

7 

+ 

32529172 

33259475 

730304 

94.8% 

16 

+ 

50111136 

50112301 

1166 

92 .4% 

12 

+ 

50241204 

50242434 

1231 

92 . 6% 

8 

+ 

131157788 

131159026 

1239 

93 .0% 

4 

- 

128545466 

128546678 

1213 

93 . 1% 

12 

- 

50941553 

50942788 

1236 

91. 6% 

2 

+ 

11341812 

11343063 

1252 

92 . 1% 

7 

+ 

33258235 

33577703 

319469 

96.2% 

13 

- 

98227915 

98229017 

1103 

91.3% 

4 

+ 

118659579 

118660823 

1245 

92 . 5% 

2 

+ 

65115467 

65116681 

1215 

91.3% 

8 

+ 

78817831 

78819076 

1246 

91 . 7% 

X 

- 

1  4QFl3  744  6 

1  49Fl3Fi651 

1206 

92 . 6% 

11 

- 

33175554 

33176777 

1224 

95.6% 

17 

- 

25463357 

25464472 

1116 

91.6% 

13 

- 

45281701 

45282930 

1230 

91.1% 

6 

- 

13005395 

13006631 

1237 

93 . 6% 

14 

+ 

48846836 

48848086 

1251 

93 . 1% 

9 

- 

83859391 

83860601 

1211 

92.4% 

7 

+ 

33330738 

33804602 

473865 

92.8% 

6 

+ 

94770485 

94771715 

1231 

91.7% 

4 

- 

63323191 

63324420 

1230 

91.0% 

7 

+ 

32602089 

32957914 

355826 

91.8% 

10 

+ 

126750178 

126751385 

1208 

92 . 1% 

13 

+ 

69484500 

69485716 

12  17 

91.8% 

1 

- 

184322480 

184323706 

1227 

91.3% 

13 

+ 

12031775 

12032998 

1224 

93 . 1% 

3 

- 

69370165 

69371417 

1253 

90.6% 

16 

+ 

49424879 

49426119 

1241 

93.7% 

12 

+ 

15734862 

15736013 

1152 

90.8% 

14 

+ 

41370492 

41371733 

1242 

89.8% 

7 

- 

74927459 

74928707 

1249 

93 .0% 

8 

+ 

83457533 

83458668 

1136 

continued 

SCORE  START  END  QSIZE  IDENTITY  CHRO  STRAND  START  END 


960 

1 

1229 

1232 

90.7% 

18 

+ 

48786308 

48787563 

960 

13 

1230 

1232 

91.4% 

1 

+ 

23486340 

23487568 

959 

12 

1136 

1232 

94.9% 

10 

+ 

22511098 

22512186 

957 

179 

1232 

1232 

95.8% 

15 

+ 

18093635 

18094718 

948 

11 

1205 

1232 

91.3% 

X 

+ 

44824945 

44826165 

947 

33 

1221 

1232 

91.4% 

2 

- 

16900259 

16901476 

947 

99 

1232 

1232 

92 . 6% 

10 

+ 

10332897 

10334049 

944 

28 

1232 

1232 

90.7% 

2 

+ 

45664969 

45666172 

941 

11 

1102 

1232 

94.3% 

10 

+ 

44725885 

44726976 

938 

17 

1163 

1232 

92.0% 

X 

- 

96276606 

96277775 

937 

12 

1232 

1232 

90.8% 

9 

- 

100525192 

100526422 

931 

55 

1224 

1232 

91.8% 

14 

+ 

33710909 

33712109 

929 

11 

1206 

1232 

91.3% 

1 

+ 

48938517 

48939746 

92  6 

8 

1160 

1232 

91.8% 

7 

+ 

37019268 

37020416 

92  6 

14 

1016 

1232 

97.1% 

11 

+ 

3851680 

3852683 

917 

12 

1204 

1232 

89.6% 

14 

- 

59497894 

59499104 

916 

16 

1232 

1232 

91.1% 

17 

+ 

41482930 

41484148 

912 

1 

1102 

1232 

93.2% 

8 

- 

41939125 

41940431 

910 

1 

1177 

1232 

89.2% 

12 

- 

62454539 

62455711 

910 

20 

1232 

1232 

88.2% 

16 

+ 

74961454 

74962697 

909 

11 

1229 

1232 

89.4% 

8 

+ 

66256808 

66258048 

901 

1 

1012 

1232 

95.5% 

15 

- 

10490319 

10491329 

B94 

46 

1231 

1232 

89.8% 

3 

- 

6450412 

6451593 

889 

93 

1220 

1232 

91.0% 

7 

+ 

33922179 

33923319 

888 

93 

1229 

1232 

91.1% 

7 

+ 

32956774 

33331982 

B87 

12 

1221 

1232 

90.9% 

18 

+ 

23375092 

23376308 

882 

30 

1097 

1232 

92.7% 

13 

- 

20796205 

20797272 

876 

11 

1063 

1232 

92.0% 

8 

- 

100089492 

100090532 

B75 

12 

1232 

1232 

90.1% 

3 

- 

46888450 

46889695 

874 

76 

1232 

1232 

94.4% 

1 

+ 

12555683 

12556670 

866 

233 

1232 

1232 

95.0% 

8 

- 

94212069 

94213034 

B66 

354 

1232 

1232 

99.4% 

17 

- 

61920156 

61921036 

859 

11 

1102 

1232 

89.9% 

3 

- 

8615221 
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